VREV for byte swap on Arm Neon? #479

Triang3l · 2021-06-07T09:50:30Z

The Arm Neon versions of byte swaps (volk_*_byteswap.h) in VOLK use shifts/OR or lookup tables, somewhat similar to the x86 versions. However, Neon has a dedicated instruction for byte swaps — VREV, usable as vrev16q_u8 for 8-in-16, vrev32q_u8 for 8-in-32, and vrev64q_u8 for 8-in-64. Are there performance/compatibility reasons for not using it, or is that more of not knowing about the instruction when the code was written?

The text was updated successfully, but these errors were encountered:

jdemel · 2021-06-07T10:17:18Z

Could you point to the exact code that you have in mind?

Are you refering to this?

volk/kernels/volk/volk_16u_byteswap.h

Lines 223 to 239 in 237a6fc

    
           static inline void volk_16u_byteswap_neon(uint16_t* intsToSwap, unsigned int num_points) 
        
           { 
        
               unsigned int number; 
        
               unsigned int eighth_points = num_points / 8; 
        
               uint16x8_t input, output; 
        
               uint16_t* inputPtr = intsToSwap; 
        
               for (number = 0; number < eighth_points; number++) { 
        
                   input = vld1q_u16(inputPtr); 
        
                   output = vsriq_n_u16(output, input, 8); 
        
                   output = vsliq_n_u16(output, input, 8); 
        
                   vst1q_u16(inputPtr, output); 
        
                   inputPtr += 8; 
        
               } 
        
               volk_16u_byteswap_generic(inputPtr, num_points - eighth_points * 8); 
        
           }

This implementation is essentially 7 years old. Are you interested in contributing an optimized version of this code?

Triang3l · 2021-06-07T18:17:25Z

Yes, and the vtbl4/vqtbl1q implementations for 32-bit and 64-bit swaps. I can try setting up the environment on my phone and write direct vrev versions, and possibly run some speed comparisons, as well as tests, in the weekend.

jdemel added the Enhancement new kernel entirely or for some specific ARCH label Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VREV for byte swap on Arm Neon? #479

VREV for byte swap on Arm Neon? #479

Triang3l commented Jun 7, 2021

jdemel commented Jun 7, 2021

Triang3l commented Jun 7, 2021

VREV for byte swap on Arm Neon? #479

VREV for byte swap on Arm Neon? #479

Comments

Triang3l commented Jun 7, 2021

jdemel commented Jun 7, 2021

Triang3l commented Jun 7, 2021