Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in memory_desc_init_by_tag: Incorrect Differentiation Between Memory Tags abcd and acbd #2175

Open
taoye9 opened this issue Oct 21, 2024 · 5 comments
Labels
platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 sighting Suspicious library behavior. Should be promoted to a bug when confirmed

Comments

@taoye9
Copy link
Contributor

taoye9 commented Oct 21, 2024

Summary

oneDNN deduces memory tag from input tensor shape using this function:

status_t status = memory_desc_init_by_tag(md_gold, md.ndims, md.dims, md.data_type, tag);

we found when the input memory description is for 4d fp32 tensor of shape 3x1x3x3, the returned status are successful for both of memory tag acbd and abcd. that is md.dims = [3, 1, 3, 3]

This bug causes issues: #2008.

Version

oneDNN v3.7.0 (commit fca8b85)

Environment

  • CPU: Neoverse-V1 C7g.4xlarge
  • OS version: Ubuntu 20.04
  • Compiler version: gcc-10
  • CMake version: 3.22.1
  • git hash: 73ae041f2b5e956b58bd58ca5d121af839af076a

Steps to reproduce

oneDNN/build/tests/benchdnn/benchdnn --engine=cpu --matmul --wtag=acbd --dtag=abcd 1x1x2x3:3x1x3x3

Observed behavior

The cmd invokes function memory_desc_matches_one_of_tag(B_md, plain_tensor_layout_tag, transposed_tensor_layout_tag, acbd, adbc); to determine memory tag of weight but get abcd instead of acbd.

Expected behavior

the returned memory tag of weight should be acbd.

@taoye9 taoye9 added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Oct 21, 2024
@shu1chen shu1chen added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Oct 22, 2024
@shu1chen
Copy link
Contributor

shu1chen commented Oct 22, 2024

Hello @taoye9, I was trying your steps to see if it's reproducible on a x86_64 machine and got this result:

$ ONEDNN_VERBOSE=1 ./tests/benchdnn/benchdnn --engine=cpu --matmul --wtag=acbd --dtag=abcd 1x1x2x3:3x1x3x3
...
onednn_verbose,v1,primitive,exec,cpu,matmul,gemm:jit:f32,undef,src:f32:a:blocked:abcd::f0 wei:f32::blocked:acbd::f0 dst:f32::blocked:abcd::f0,,,1x1x2x3:3x1x3x3,0.791992

The weight shows acbd on x86_64, while it shows abcd on aarch64,which is incorrect. Am I understanding right?

@taoye9
Copy link
Contributor Author

taoye9 commented Oct 22, 2024

Hi, @shu1chen, thanks for reply.

Not sure whether this benchdnn output can reveal this bug as this function is called during pd_t initialisation stage. Here is when the bug happens: during initialisation of pd_t, memory format is deduced from tensor dims instead of explicit passed in variable (bit weird). we then call memory_desc_init_by_tag to deduce memory format.

here is the code which decides weight memory format during pd_t init: https://github.com/oneapi-src/oneDNN/blob/main/src/cpu/aarch64/matmul/brgemm_matmul_utils.cpp#L221

when the weight is acbd and [3, 1, 3, 3], it's expected to return acbd but got abcd in this case.

My preliminary guess is that the logic in determine memory tag is of false for some cases. https://github.com/oneapi-src/oneDNN/blob/main/src/common/memory_desc.cpp#L34

@shu1chen
Copy link
Contributor

shu1chen commented Oct 23, 2024

I worked on git commit cfe12d8 and added output for the equivalent x86_64 code snippet https://github.com/oneapi-src/oneDNN/blob/main/src/cpu/x64/matmul/brgemm_matmul_utils.cpp#L318

std::cout << "output:" << memory_desc_matches_one_of_tag(B_md, plain_tensor_layout_tag,
                        transposed_tensor_layout_tag, acbd, adbc) << std::endl;

It also returns abcd, same as aarch64, while the test passed:

$ ./tests/benchdnn/benchdnn --engine=cpu --matmul --wtag=acbd --dtag=abcd 1x1x2x3:3x1x3x3
wei_tag: 5
output: 5
0:PASSED __REPRO: --matmul --wtag=acbd --dtag=abcd 1x1x2x3:3x1x3x3
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.06s; fill: 0.01s (22%); compute_ref: 0.01s (15%); compare: 0.02s (31%);

While on the latest main branch (commit 81b366c), the test also failed on x86_64. Some recent commits cause very similar errors on x86_64:

$ ./tests/benchdnn/benchdnn --engine=cpu --matmul --wtag=acbd --dtag=abcd 1x1x2x3:3x1x3x3
wei_tag: 5
output:5
[   6][DST][1:0:0:0] exp_f32:        7186 exp:        7186 got:        -nan diff:     nan rdiff:     nan
[   7][DST][1:0:0:1] exp_f32:       -5487 exp:       -5487 got:        -nan diff:     nan rdiff:     nan
[   8][DST][1:0:0:2] exp_f32:       -5549 exp:       -5549 got:        -nan diff:     nan rdiff:     nan
[   9][DST][1:0:1:0] exp_f32:       -3131 exp:       -3131 got:        -nan diff:     nan rdiff:     nan
[  10][DST][1:0:1:1] exp_f32:        2511 exp:        2511 got:        -nan diff:     nan rdiff:     nan
[  11][DST][1:0:1:2] exp_f32:        -240 exp:        -240 got:        -nan diff:     nan rdiff:     nan
[  12][DST][2:0:0:0] exp_f32:       -1727 exp:       -1727 got:        -nan diff:     nan rdiff:     nan
[  13][DST][2:0:0:1] exp_f32:       -9256 exp:       -9256 got:        -nan diff:     nan rdiff:     nan
[  14][DST][2:0:0:2] exp_f32:        2826 exp:        2826 got:        -nan diff:     nan rdiff:     nan
[  15][DST][2:0:1:0] exp_f32:        1391 exp:        1391 got:        -nan diff:     nan rdiff:     nan
[COMPARE_STATS][DST]: trh=0 err_max_diff:     nan err_max_rdiff:     nan all_max_diff:     nan all_max_rdiff:     nan
0:FAILED (errors:12 total:18) __REPRO: --matmul --wtag=acbd --dtag=abcd 1x1x2x3:3x1x3x3
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:1 listed:0
total: 0.07s; fill: 0.02s (32%); compute_ref: 0.01s (13%); compare: 0.02s (27%);

I'll try to debug it.

@shu1chen
Copy link
Contributor

shu1chen commented Oct 23, 2024

The commit that caused the test failure on x86_64 is 3877097 and the previous commit 6bfb6a8 is good. Since the bug on x86_64 is unrelated to memory_desc_init_by_tag function, I'll submit a new issue.

@mgouicem
Copy link
Contributor

mgouicem commented Oct 23, 2024

Hi @TaoYe. This is not a bug but a feature :)
Internaly, memory descriptors do not have any notion of tag, just strides. To align with PyTorch semantics (and python based frameworks in general), strides for unit dimension are ignored (since they will never be applied). See here for the matching check.

Now when memory_desc_matches_one_of_tag, is used, for that particular case above [3x1x3x3], strides would be be [9,9,3,1] for abcd, and [9,3,3,1] for acbd. However, because b dimension is 1, the corresponding stride is ignored, so indeed, both tags match the strides properly ([9, _, 3, 1]), and the first matching one would be returned.

@shu1chen thanks, this is a known issue and there is an internal tracker for it. It is being worked on currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 sighting Suspicious library behavior. Should be promoted to a bug when confirmed
Projects
None yet
Development

No branches or pull requests

3 participants