You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Arm Neon versions of byte swaps (volk_*_byteswap.h) in VOLK use shifts/OR or lookup tables, somewhat similar to the x86 versions. However, Neon has a dedicated instruction for byte swaps — VREV, usable as vrev16q_u8 for 8-in-16, vrev32q_u8 for 8-in-32, and vrev64q_u8 for 8-in-64. Are there performance/compatibility reasons for not using it, or is that more of not knowing about the instruction when the code was written?
The text was updated successfully, but these errors were encountered:
Yes, and the vtbl4/vqtbl1q implementations for 32-bit and 64-bit swaps. I can try setting up the environment on my phone and write direct vrev versions, and possibly run some speed comparisons, as well as tests, in the weekend.
The Arm Neon versions of byte swaps (volk_*_byteswap.h) in VOLK use shifts/OR or lookup tables, somewhat similar to the x86 versions. However, Neon has a dedicated instruction for byte swaps — VREV, usable as
vrev16q_u8
for 8-in-16,vrev32q_u8
for 8-in-32, andvrev64q_u8
for 8-in-64. Are there performance/compatibility reasons for not using it, or is that more of not knowing about the instruction when the code was written?The text was updated successfully, but these errors were encountered: