-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race reported between Write access and Read access in fusion using async copy #3428
Comments
If
|
A potential explanation:
The asynchronous copy to T3 (step 1) is performed only by threads with threadIdx.y == 0, but T3 is accessed by all threads in the block (step 4). If threadIdx.y != 0 threads do not wait for the async copy to T3 to complete, they might read from T3 before the data has been fully written, leading to a race condition. Adding an additional |
Originally found in
CombinedSchedulerTest.LayerNormBackward/dtype_double_batch_216_hidden_96
usingCan be reproduced with a simple fusion:
If run this fusion using current main branch with
NVFUSER_DUMP=cuda_to_file NVFUSER_ENABLE=kernel_debug compute-sanitizer --tool racecheck ./nvfuser_tests --gtest_filter=*FusionCpAsyncRaceBcastInlined/0
will get:The generated kernel is
Race happens at the read & write of
T3
The text was updated successfully, but these errors were encountered: