-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[thunder] INTERNAL_ASSERT_FAILED #3461
Comments
There's some recent bug fixes that may be related. Could you try the latest version? |
Updating to latest worked locally. I will wait for the thunder CI to pick-up the update and then close the issue once CI is green. Thank you!! |
On latest version, I am seeing an internal assert (this is using an internal image) # CUDA devices:
# 0: NVIDIA RTX 6000 Ada Generation
# torch version: 2.6.0a0+git45ed7c1
# nvfuser version: 0.2.23+git8546b62
import torch
from nvfuser import FusionDefinition, DataType
def nvfuser_fusion_id0(fd : FusionDefinition) -> None :
T0 = fd.define_tensor(shape=[128, 4], contiguity=[True, True], dtype=DataType.Float, is_cpu=False, stride_order=[1, 0])
T1 = fd.define_tensor(shape=[128, 4], contiguity=[True, True], dtype=DataType.Float, is_cpu=False, stride_order=[1, 0])
T2 = fd.define_tensor(shape=[5, 5, 288], contiguity=[True, True, True], dtype=DataType.Float, is_cpu=False, stride_order=[2, 1, 0])
T3 = fd.define_tensor(shape=[5, 5, 1024], contiguity=[True, True, True], dtype=DataType.Float, is_cpu=False, stride_order=[2, 1, 0])
T13 = fd.ops.slice(T0, start_indices=[0, 0], end_indices=[5, 4], strides=[1, 1], manual_normalization=0)
T23 = fd.ops.slice(T1, start_indices=[0, 0], end_indices=[5, 4], strides=[1, 1], manual_normalization=0)
T30 = fd.ops.reshape(T2, new_shape=[5, 5, 4, 18, 4])
T31 = fd.ops.permute(T30, dims=[0, 2, 3, 1, 4])
T50 = fd.ops.slice(T31, start_indices=[0, 0, 0, 0, 0], end_indices=[5, 4, 16, 5, 4], strides=[1, 1, 1, 1, 1], manual_normalization=0)
T69 = fd.ops.slice(T31, start_indices=[0, 0, 16, 0, 0], end_indices=[5, 4, 17, 5, 4], strides=[1, 1, 1, 1, 1], manual_normalization=0)
T88 = fd.ops.slice(T31, start_indices=[0, 0, 17, 0, 0], end_indices=[5, 4, 18, 5, 4], strides=[1, 1, 1, 1, 1], manual_normalization=0)
T95 = fd.ops.broadcast_in_dim(T69, shape=[5, 4, 16, 5, 4], broadcast_dims=[0, 1, 2, 3, 4])
T102 = fd.ops.broadcast_in_dim(T88, shape=[5, 4, 16, 5, 4], broadcast_dims=[0, 1, 2, 3, 4])
T108 = fd.ops.reshape(T50, new_shape=[5, 64, 5, 4])
T114 = fd.ops.reshape(T95, new_shape=[5, 64, 5, 4])
T120 = fd.ops.reshape(T102, new_shape=[5, 64, 5, 4])
T136 = fd.ops.slice(T108, start_indices=[0, 0, 0, 0], end_indices=[5, 64, 5, 2], strides=[1, 1, 1, 1], manual_normalization=0)
T152 = fd.ops.slice(T108, start_indices=[0, 0, 0, 2], end_indices=[5, 64, 5, 4], strides=[1, 1, 1, 1], manual_normalization=0)
T153 = fd.ops.neg(T152)
T154 = fd.ops.cat([T153, T136], dim=-1, manual_padding=0)
T160 = fd.ops.broadcast_in_dim(T13, shape=[5, 64, 5, 4], broadcast_dims=[2, 3])
T161 = fd.ops.mul(T108, T160)
T167 = fd.ops.broadcast_in_dim(T23, shape=[5, 64, 5, 4], broadcast_dims=[2, 3])
T168 = fd.ops.mul(T154, T167)
T169 = fd.ops.add(T161, T168)
T185 = fd.ops.slice(T114, start_indices=[0, 0, 0, 0], end_indices=[5, 64, 5, 2], strides=[1, 1, 1, 1], manual_normalization=0)
T201 = fd.ops.slice(T114, start_indices=[0, 0, 0, 2], end_indices=[5, 64, 5, 4], strides=[1, 1, 1, 1], manual_normalization=0)
T202 = fd.ops.neg(T201)
T203 = fd.ops.cat([T202, T185], dim=-1, manual_padding=0)
T204 = fd.ops.mul(T114, T160)
T205 = fd.ops.mul(T203, T167)
T206 = fd.ops.add(T204, T205)
T222 = fd.ops.slice(T108, start_indices=[0, 0, 0, 0], end_indices=[5, 64, 5, 0], strides=[1, 1, 1, 1], manual_normalization=0)
T223 = fd.ops.cat([T169, T222], dim=-1, manual_padding=0)
T239 = fd.ops.slice(T114, start_indices=[0, 0, 0, 0], end_indices=[5, 64, 5, 0], strides=[1, 1, 1, 1], manual_normalization=0)
T240 = fd.ops.cat([T206, T239], dim=-1, manual_padding=0)
S241 = fd.define_scalar(0.707107, dtype=DataType.Double)
T242 = fd.ops.mul(T223, S241)
T243 = fd.ops.permute(T240, dims=[0, 1, 3, 2])
S244 = fd.define_scalar(0.707107, dtype=DataType.Double)
T245 = fd.ops.mul(T243, S244)
S246 = fd.define_scalar(1.41421, dtype=DataType.Double)
S247 = fd.ops.reciprocal(S246)
T248 = fd.ops.mul(T3, S247)
T249 = fd.ops.erf(T248)
S250 = fd.define_scalar(0.500000, dtype=DataType.Double)
T251 = fd.ops.mul(S250, T249)
S252 = fd.define_scalar(0.500000, dtype=DataType.Double)
T253 = fd.ops.add(S252, T251)
T254 = fd.ops.mul(T3, T253)
fd.add_output(T120)
fd.add_output(T160)
fd.add_output(T167)
fd.add_output(T242)
fd.add_output(T245)
fd.add_output(T254)
with FusionDefinition() as fd:
nvfuser_fusion_id0(fd)
inputs = [
torch.testing.make_tensor((128, 4), dtype=torch.float32, device='cuda:0'),
torch.testing.make_tensor((128, 4), dtype=torch.float32, device='cuda:0'),
torch.testing.make_tensor((5, 5, 288), dtype=torch.float32, device='cuda:0'),
torch.testing.make_tensor((5, 5, 1024), dtype=torch.float32, device='cuda:0'),
]
fd.execute(inputs) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Repro Script
Failing CI - https://dev.azure.com/Lightning-AI/lightning/_build/results?buildId=220387&view=logs&j=3f274fac-2e11-54ca-487e-194c91f3ae9f&t=244491d3-5bd5-5b27-6d81-66bb4c7264ae&l=375
CI Log - ci_log.txt
The text was updated successfully, but these errors were encountered: