-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Why do we only need the result of the last k-loop in cute::gemm
dispatch-5?
#1629
Comments
@ccecka can you please help take a look at this one. we discussed offline but this does seem legit |
I agree with this MR and believe it is no-cost in terms of perf. Let's open it back up and approve. |
I believe there is no if-cond in final assembly after fully unroll. |
Would you mind merging this pr #1618 to fix this problem ? It seems I have no authority to reopen. |
This issue has been labeled |
This issue has been labeled |
the original code is as follow
In the for-loop of dim-k (D = Ak x Bk + C), the result of the last calculation will override the result of the previous one.
For example, the following code
will get
instead of 12.
This is only correct when C and D point to the same register that result will be accumulated properly. Is this a restriction for calling this function(
cute::gemm
dispatch-5)?The text was updated successfully, but these errors were encountered: