Skip to content

How does mcast_mask behave in TMA_LOAD_MULTICAST? #1321

Answered by ccecka
hyhieu asked this question in Q&A
Discussion options

You must be logged in to vote

Your concept of the parameters appears correct, but the Multicast TMAs would not be used to copy a 16x8 gmem tensor to two 8x8 smem tensors. In your example case, each copy appears to be completely independent.

Instead, the Multicast TMAs are used to copy a a single 8x8 gmem tensor to two 8x8 smem tensors in a broadcasted fashion, where the broadcast is performed across all participating CTAs in the mcast_mask.

This is useful in GEMMs, for example, because the A tiles can be broadcasted across each "row" of CTAs and the B tiles can be broadcasted across each "column" of CTAs.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by hyhieu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants