-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in memory_desc_init_by_tag: Incorrect Differentiation Between Memory Tags abcd and acbd #2175
Comments
Hello @taoye9, I was trying your steps to see if it's reproducible on a x86_64 machine and got this result:
The weight shows acbd on x86_64, while it shows abcd on aarch64,which is incorrect. Am I understanding right? |
Hi, @shu1chen, thanks for reply. Not sure whether this benchdnn output can reveal this bug as this function is called during pd_t initialisation stage. Here is when the bug happens: during initialisation of pd_t, memory format is deduced from tensor dims instead of explicit passed in variable (bit weird). we then call here is the code which decides weight memory format during pd_t init: https://github.com/oneapi-src/oneDNN/blob/main/src/cpu/aarch64/matmul/brgemm_matmul_utils.cpp#L221 when the weight is acbd and [3, 1, 3, 3], it's expected to return acbd but got abcd in this case. My preliminary guess is that the logic in determine memory tag is of false for some cases. https://github.com/oneapi-src/oneDNN/blob/main/src/common/memory_desc.cpp#L34 |
I worked on git commit cfe12d8 and added output for the equivalent x86_64 code snippet https://github.com/oneapi-src/oneDNN/blob/main/src/cpu/x64/matmul/brgemm_matmul_utils.cpp#L318
It also returns abcd, same as aarch64, while the test passed:
While on the latest main branch (commit 81b366c), the test also failed on x86_64. Some recent commits cause very similar errors on x86_64:
I'll try to debug it. |
Hi @TaoYe. This is not a bug but a feature :) Now when memory_desc_matches_one_of_tag, is used, for that particular case above [3x1x3x3], strides would be be [9,9,3,1] for abcd, and [9,3,3,1] for acbd. However, because b dimension is 1, the corresponding stride is ignored, so indeed, both tags match the strides properly ([9, _, 3, 1]), and the first matching one would be returned. @shu1chen thanks, this is a known issue and there is an internal tracker for it. It is being worked on currently. |
Summary
oneDNN deduces memory tag from input tensor shape using this function:
status_t status = memory_desc_init_by_tag(md_gold, md.ndims, md.dims, md.data_type, tag);
we found when the input memory description is for 4d fp32 tensor of shape
3x1x3x3
, the returned status are successful for both of memory tag acbd and abcd. that ismd.dims = [3, 1, 3, 3]
This bug causes issues: #2008.
Version
oneDNN v3.7.0 (commit fca8b85)
Environment
Steps to reproduce
oneDNN/build/tests/benchdnn/benchdnn --engine=cpu --matmul --wtag=acbd --dtag=abcd 1x1x2x3:3x1x3x3
Observed behavior
The cmd invokes function
memory_desc_matches_one_of_tag(B_md, plain_tensor_layout_tag, transposed_tensor_layout_tag, acbd, adbc);
to determine memory tag of weight but get abcd instead of acbd.Expected behavior
the returned memory tag of weight should be acbd.
The text was updated successfully, but these errors were encountered: