Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

aviator19941 · 2024-11-22T03:24:10Z

What happened?

When I try to compile the sharded Llama 3.1 8b fp16 IR for CPU or GPU:

I get this error for CPU:
https://gist.github.com/aviator19941/82bceb2624571d446da0964440790fde

and this error for GPU:
https://gist.github.com/aviator19941/89761b3bbb6ace5a6945de667e6d1e39

I tried to use these flags that were suggested to be used when compiling Llama as well:
--iree-dispatch-creation-enable-aggressive-fusion=true --iree-global-opt-propagate-transposes=true --iree-opt-aggressively-propagate-transposes=true --iree-opt-data-tiling=false --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' --iree-hal-indirect-command-buffers=true --iree-stream-resource-memory-model=discrete --iree-hip-legacy-sync=false --iree-hal-memoization=true --iree-opt-strip-assertions

Steps to reproduce your issue

wget the IR: https://gist.github.com/aviator19941/bab5886f53f2fd0b3b8458519148542c
Try to compile for CPU:
../iree-build-no-trace/tools/iree-compile 8b_f16_tp8_decomposed.mlir -o=8b_f16_tp8_decomposed_cpu.vmfb --iree-hal-target-device=llvm-cpu[0] --iree-hal-target-device=llvm-cpu[1] --iree-hal-target-device=llvm-cpu[2] --iree-hal-target-device=llvm-cpu[3] --iree-hal-target-device=llvm-cpu[4] --iree-hal-target-device=llvm-cpu[5] --iree-hal-target-device=llvm-cpu[6] --iree-hal-target-device=llvm-cpu[7]
CPU error: https://gist.github.com/aviator19941/82bceb2624571d446da0964440790fde
Try to compile for GPU:
../iree-build-no-trace/tools/iree-compile 8b_f16_tp8_decomposed.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-device=hip[0] --iree-hal-target-device=hip[1] --iree-hal-target-device=hip[2] --iree-hal-target-device=hip[3] --iree-hal-target-device=hip[4] --iree-hal-target-device=hip[5] --iree-hal-target-device=hip[6] --iree-hal-target-device=hip[7]
GPU error: https://gist.github.com/aviator19941/89761b3bbb6ace5a6945de667e6d1e39

What component(s) does this issue relate to?

No response

Version information

iree-base-compiler 3.1.0rc20241121

Additional context

No response

The text was updated successfully, but these errors were encountered:

sogartar · 2024-11-22T15:29:03Z

About the CPU compilation error. I made a fix when exporting for the unsharded case where we want no device affinities. This is a sharded variant. At a first glance argument and global parameter affinities look fine. It is probably something with the flow.tensor.transfer ops.

nirvedhmeshram · 2024-11-22T21:27:34Z

The GPU error is in an attention dispatch failing. It is going down the LLVMGPUDistribute which is not the pipeline we want for it. Here is the Input IR for it.

You can run this with

iree-compile attention_dispatch.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-backends=rocm --compile-from=executable-sources --mlir-print-ir-after-all &> output.mlir

Here is the full dump.

Looks like it has some dynamic shapes so I am guessing vector distribute bailed on it.

CC @raikonenfnu @Groverkss @kumardeepakamd

aviator19941 · 2024-11-22T22:28:23Z

The GPU error is in an attention dispatch failing. It is going down the LLVMGPUDistribute which is not the pipeline we want for it. Here is the Input IR for it.

You can run this with
iree-compile attention_dispatch.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-backends=rocm --compile-from=executable-sources --mlir-print-ir-after-all &> output.mlir
Here is the full dump.

Looks like it has some dynamic shapes so I am guessing vector distribute bailed on it.

CC @raikonenfnu @Groverkss @kumardeepakamd

I think @sogartar suggested we not compile with --iree-hal-target-backends=rocm since it is considered a legacy flag and will be removed in the future.

nirvedhmeshram · 2024-11-22T23:40:21Z

The GPU error is in an attention dispatch failing. It is going down the LLVMGPUDistribute which is not the pipeline we want for it. Here is the Input IR for it.
You can run this with
iree-compile attention_dispatch.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-backends=rocm --compile-from=executable-sources --mlir-print-ir-after-all &> output.mlir
Here is the full dump.
Looks like it has some dynamic shapes so I am guessing vector distribute bailed on it.
CC @raikonenfnu @Groverkss @kumardeepakamd
I think @sogartar suggested we not compile with --iree-hal-target-backends=rocm since it is considered a legacy flag and will be removed in the future.

Yes I was just using that to be concise, you can use the new flags and get the same error too.

aviator19941 added the bug 🐞 Something isn't working label Nov 22, 2024

aviator19941 changed the title ~~Llama 3.1 8B fp16 sharded fails to compile for CPU and GPU~~ Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU Nov 22, 2024

aviator19941 assigned sogartar Nov 22, 2024

sogartar removed their assignment Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

aviator19941 commented Nov 22, 2024 •

edited

Loading

sogartar commented Nov 22, 2024 •

edited

Loading

nirvedhmeshram commented Nov 22, 2024 •

edited

Loading

aviator19941 commented Nov 22, 2024

nirvedhmeshram commented Nov 22, 2024

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

Comments

aviator19941 commented Nov 22, 2024 • edited Loading

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

sogartar commented Nov 22, 2024 • edited Loading

nirvedhmeshram commented Nov 22, 2024 • edited Loading

aviator19941 commented Nov 22, 2024

nirvedhmeshram commented Nov 22, 2024

aviator19941 commented Nov 22, 2024 •

edited

Loading

sogartar commented Nov 22, 2024 •

edited

Loading

nirvedhmeshram commented Nov 22, 2024 •

edited

Loading