Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

Open
aviator19941 opened this issue Nov 22, 2024 · 4 comments
Open

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

aviator19941 opened this issue Nov 22, 2024 · 4 comments
Labels
bug 🐞 Something isn't working

Comments

@aviator19941
Copy link
Contributor

aviator19941 commented Nov 22, 2024

What happened?

When I try to compile the sharded Llama 3.1 8b fp16 IR for CPU or GPU:

I get this error for CPU:
https://gist.github.com/aviator19941/82bceb2624571d446da0964440790fde

and this error for GPU:
https://gist.github.com/aviator19941/89761b3bbb6ace5a6945de667e6d1e39

I tried to use these flags that were suggested to be used when compiling Llama as well:
--iree-dispatch-creation-enable-aggressive-fusion=true --iree-global-opt-propagate-transposes=true --iree-opt-aggressively-propagate-transposes=true --iree-opt-data-tiling=false --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' --iree-hal-indirect-command-buffers=true --iree-stream-resource-memory-model=discrete --iree-hip-legacy-sync=false --iree-hal-memoization=true --iree-opt-strip-assertions

Steps to reproduce your issue

  1. wget the IR: https://gist.github.com/aviator19941/bab5886f53f2fd0b3b8458519148542c
  2. Try to compile for CPU:
    ../iree-build-no-trace/tools/iree-compile 8b_f16_tp8_decomposed.mlir -o=8b_f16_tp8_decomposed_cpu.vmfb --iree-hal-target-device=llvm-cpu[0] --iree-hal-target-device=llvm-cpu[1] --iree-hal-target-device=llvm-cpu[2] --iree-hal-target-device=llvm-cpu[3] --iree-hal-target-device=llvm-cpu[4] --iree-hal-target-device=llvm-cpu[5] --iree-hal-target-device=llvm-cpu[6] --iree-hal-target-device=llvm-cpu[7]
  3. CPU error: https://gist.github.com/aviator19941/82bceb2624571d446da0964440790fde
  4. Try to compile for GPU:
    ../iree-build-no-trace/tools/iree-compile 8b_f16_tp8_decomposed.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-device=hip[0] --iree-hal-target-device=hip[1] --iree-hal-target-device=hip[2] --iree-hal-target-device=hip[3] --iree-hal-target-device=hip[4] --iree-hal-target-device=hip[5] --iree-hal-target-device=hip[6] --iree-hal-target-device=hip[7]
  5. GPU error: https://gist.github.com/aviator19941/89761b3bbb6ace5a6945de667e6d1e39

What component(s) does this issue relate to?

No response

Version information

iree-base-compiler 3.1.0rc20241121

Additional context

No response

@aviator19941 aviator19941 added the bug 🐞 Something isn't working label Nov 22, 2024
@aviator19941 aviator19941 changed the title Llama 3.1 8B fp16 sharded fails to compile for CPU and GPU Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU Nov 22, 2024
@sogartar
Copy link
Contributor

sogartar commented Nov 22, 2024

About the CPU compilation error. I made a fix when exporting for the unsharded case where we want no device affinities. This is a sharded variant. At a first glance argument and global parameter affinities look fine. It is probably something with the flow.tensor.transfer ops.

@nirvedhmeshram
Copy link
Contributor

nirvedhmeshram commented Nov 22, 2024

The GPU error is in an attention dispatch failing. It is going down the LLVMGPUDistribute which is not the pipeline we want for it. Here is the Input IR for it.

You can run this with

iree-compile attention_dispatch.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-backends=rocm --compile-from=executable-sources --mlir-print-ir-after-all &> output.mlir

Here is the full dump.

Looks like it has some dynamic shapes so I am guessing vector distribute bailed on it.

CC @raikonenfnu @Groverkss @kumardeepakamd

@sogartar sogartar removed their assignment Nov 22, 2024
@aviator19941
Copy link
Contributor Author

The GPU error is in an attention dispatch failing. It is going down the LLVMGPUDistribute which is not the pipeline we want for it. Here is the Input IR for it.

You can run this with

iree-compile attention_dispatch.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-backends=rocm --compile-from=executable-sources --mlir-print-ir-after-all &> output.mlir

Here is the full dump.

Looks like it has some dynamic shapes so I am guessing vector distribute bailed on it.

CC @raikonenfnu @Groverkss @kumardeepakamd

I think @sogartar suggested we not compile with --iree-hal-target-backends=rocm since it is considered a legacy flag and will be removed in the future.

@nirvedhmeshram
Copy link
Contributor

The GPU error is in an attention dispatch failing. It is going down the LLVMGPUDistribute which is not the pipeline we want for it. Here is the Input IR for it.
You can run this with

iree-compile attention_dispatch.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-backends=rocm --compile-from=executable-sources --mlir-print-ir-after-all &> output.mlir

Here is the full dump.
Looks like it has some dynamic shapes so I am guessing vector distribute bailed on it.
CC @raikonenfnu @Groverkss @kumardeepakamd

I think @sogartar suggested we not compile with --iree-hal-target-backends=rocm since it is considered a legacy flag and will be removed in the future.

Yes I was just using that to be concise, you can use the new flags and get the same error too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants