Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VREV for byte swap on Arm Neon? #479

Open
Triang3l opened this issue Jun 7, 2021 · 2 comments
Open

VREV for byte swap on Arm Neon? #479

Triang3l opened this issue Jun 7, 2021 · 2 comments
Labels
Enhancement new kernel entirely or for some specific ARCH

Comments

@Triang3l
Copy link

Triang3l commented Jun 7, 2021

The Arm Neon versions of byte swaps (volk_*_byteswap.h) in VOLK use shifts/OR or lookup tables, somewhat similar to the x86 versions. However, Neon has a dedicated instruction for byte swaps — VREV, usable as vrev16q_u8 for 8-in-16, vrev32q_u8 for 8-in-32, and vrev64q_u8 for 8-in-64. Are there performance/compatibility reasons for not using it, or is that more of not knowing about the instruction when the code was written?

@jdemel
Copy link
Contributor

jdemel commented Jun 7, 2021

Could you point to the exact code that you have in mind?

Are you refering to this?

static inline void volk_16u_byteswap_neon(uint16_t* intsToSwap, unsigned int num_points)
{
unsigned int number;
unsigned int eighth_points = num_points / 8;
uint16x8_t input, output;
uint16_t* inputPtr = intsToSwap;
for (number = 0; number < eighth_points; number++) {
input = vld1q_u16(inputPtr);
output = vsriq_n_u16(output, input, 8);
output = vsliq_n_u16(output, input, 8);
vst1q_u16(inputPtr, output);
inputPtr += 8;
}
volk_16u_byteswap_generic(inputPtr, num_points - eighth_points * 8);
}

This implementation is essentially 7 years old. Are you interested in contributing an optimized version of this code?

@Triang3l
Copy link
Author

Triang3l commented Jun 7, 2021

Yes, and the vtbl4/vqtbl1q implementations for 32-bit and 64-bit swaps. I can try setting up the environment on my phone and write direct vrev versions, and possibly run some speed comparisons, as well as tests, in the weekend.

@jdemel jdemel added the Enhancement new kernel entirely or for some specific ARCH label Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement new kernel entirely or for some specific ARCH
Projects
None yet
Development

No branches or pull requests

2 participants