Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX512 detection failed when cpu supports AVX512 #291

Open
swhzzh opened this issue Jul 11, 2024 · 3 comments
Open

AVX512 detection failed when cpu supports AVX512 #291

swhzzh opened this issue Jul 11, 2024 · 3 comments

Comments

@swhzzh
Copy link

swhzzh commented Jul 11, 2024

OS: debian 9
GCC: 6.3
NASM: 2.12.01
CPU: Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz
ISA-L: 2.31

I have confirmed that my cpu supports AVX512 through https://ark.intel.com/content/www/us/en/ark/products/215269/intel-xeon-silver-4314-processor-24m-cache-2-40-ghz.html. However the AVX512 detection in building isa-l failed. I tried to run the detection code myself and got the following output:
$echo vinserti32x8 zmm0, ymm1, 1\; > tst.asm && nasm -f elf64 tst.asm && echo pass
tst.asm:1: error: invalid combination of opcode and operands.

I want to know why the detection failed and if there are some operations to the OS kernel I need to perform to enable AVX512. Thanks for answering!

@pablodelara
Copy link
Contributor

Hi @swhzzh. Your NASM version is way too old. You should install NASM 2.13.03 at least.

@swhzzh
Copy link
Author

swhzzh commented Jul 11, 2024

Hi @swhzzh. Your NASM version is way too old. You should install NASM 2.13.03 at least.

I install NASM 2.16 and then passed the check, thanks!
Besides, i want to know which xor_gen method is called at runtime, anyway to do that?

@swhzzh
Copy link
Author

swhzzh commented Jul 12, 2024

@pablodelara Hi, i benchmark xor_gen 10 +1 performance in different sizes. I can see that:

when CPU L2 Cache(1280K) can hold all data units(the test length is less than 128K per unit), the xor_gen performance when enable AVX512 is much better than not enable; however, for larger sizes, the xor_gen performance when enable AVX512 is worse than not enable.

Can you tell me why? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants