Replies: 2 comments 2 replies
-
at what granularity do you want to do this? A single instruction? After the whole tile wide GEMM during the epilogue? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to implement strassen's fast matrix multiplication algorithm based on cutlass, and could not figure out how to accumulate result of matrix multiplication to two destination matrices.
Could someone help me? Thank you in advance!
Two related pictures:
Beta Was this translation helpful? Give feedback.
All reactions