Windows: ARM64/NEON Support #769

JVital2013 · 2024-09-04T18:01:13Z

Fixes NEON support on Windows ARM64. Most of the changes deal with the somewhat odd way in which Microsoft deals with the uint8x16_t, etc types.

Additionally, MSVC on ARM/ARM64 does not support any inline assembly. Therefore __VOLK_ASM does not work, so a few workarounds have been implemented. One of these workarounds is to assume NEON is present if you're using MSVC, and building for ARM64 instead of running a check. NEON is a requirement for Windows on ARM as shown here: https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170.

Note that NEONv7 on Windows is not supported in this PR, as it would require re-wiring all the kernels into MASM (or some other format) to build on MSVC. Windows on 32-bit ARM is being deprecated anyway, so I'm thinking the neonv7 kernels can be ignored on Windows.

Signed-off-by: Jamie Vital <[email protected]>

marcusmueller · 2024-09-05T11:16:46Z

kernels/volk/volk_32f_index_max_32u.h

@@ -299,7 +299,11 @@ volk_32f_index_max_32u_neon(uint32_t* target, const float* src0, uint32_t num_po
            if (maxValuesBuffer[number] > max) {
                index = maxIndexesBuffer[number];
                max = maxValuesBuffer[number];
+#ifdef _MSC_VER
+            } else if (maxValues.n128_f32[number] == max) {


I must admit I don't quite understand that; what type is float32x4_t on MSVC/aarch64/neon?

marcusmueller · 2024-09-05T11:17:54Z

lib/CMakeLists.txt

+    if(MSVC)
+        if(CMAKE_SYSTEM_PROCESSOR STREQUAL "ARM")
+            overrule_arch(neonv8 "Compiler doesn't support neonv8")
+        endif()


jdemel

Thanks for the PR!

My comments are rather critical because the changes are specific to MSVC, which is notoriously non-standard compliant. This might affect the result for other compilers.

I'd really like to merge this PR but make sure, that we don't see a performance regression.

The whole assume NEON is present etc. reasoning is good. Maybe, you could add a comment next to the checks to make sure this knowledge doesn't get lost over time.

jdemel · 2024-09-13T02:39:42Z

lib/CMakeLists.txt

        have_neonv7_result)
-    check_c_source_compiles(
-        "#include <volk/volk_common.h>\n int main(){__VOLK_ASM(\"sub v1.4s,v1.4s,v1.4s\");}"
-        have_neonv8_result)
-
    if(NOT have_neonv7_result)
        overrule_arch(neonv7 "Compiler doesn't support neonv7")
    endif()
-
-    if(NOT have_neonv8_result)
-        overrule_arch(neonv8 "Compiler doesn't support neonv8")
-    endif()
 else(neon_compile_result)


Why would you want to remove the non-MSVC checks here? Looking at this section makes me think, it might be a good time to unify this section. Just a thought.

jdemel · 2024-09-13T02:42:25Z

kernels/volk/volk_32u_reverse_32u.h

-#if defined(__aarch64__)
+#ifdef _MSC_VER
+#define DO_RBIT                                                                 \
+    *out_ptr = _byteswap_ulong(*in_ptr);                                        \
+    *out_ptr = ((*out_ptr & 0x55555555) << 1) | ((*out_ptr & 0xAAAAAAAA) >> 1); \
+    *out_ptr = ((*out_ptr & 0x33333333) << 2) | ((*out_ptr & 0xCCCCCCCC) >> 2); \
+    *out_ptr = ((*out_ptr & 0x0F0F0F0F) << 4) | ((*out_ptr & 0xF0F0F0F0) >> 4); \
+    in_ptr++;                                                                   \
+    out_ptr++;
+#elif defined(__aarch64__)


This section indicates that we already had special treatment for some conditions. Now, it looks like we exchange them for another set of conditions. This change would warrant a comment on the specifics of different platforms. This will help anyone looking at this in the future.

jdemel · 2024-09-13T02:46:51Z

kernels/volk/volk_32u_reverse_32u.h

-    const uint8x16_t idx = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
+    uint8x16_t idx;
+    const uint8_t idx_data[] = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
+    idx = vld1q_u8(idx_data);


why? The data type implies 16 values of the same type. Does MSVC fail to perform the correct aggregate initialization? This indirection may negatively impact optimizations.
I'd like to see this unified in one line. Would that be possible?
Would it be possible to have a godbolt/compiler explorer comparison of this for GCC/Clang vs. MSVC?

jdemel · 2024-09-13T02:47:47Z

kernels/volk/volk_32fc_convert_16ic.h

-    int32x4_t toint_a = { 0, 0, 0, 0 }, toint_b = { 0, 0, 0, 0 };
+    int32x4_t toint_a = { 0 }, toint_b = { 0 };


While you're at it, could you change this to give every variable their own statement?

jdemel · 2024-09-13T02:48:46Z

kernels/volk/volk_32fc_accumulator_s32fc.h

    float32x4_t in_vec;
-    float32x4_t out_vec0 = { 0.f, 0.f, 0.f, 0.f };
-    float32x4_t out_vec1 = { 0.f, 0.f, 0.f, 0.f };
-    float32x4_t out_vec2 = { 0.f, 0.f, 0.f, 0.f };
-    float32x4_t out_vec3 = { 0.f, 0.f, 0.f, 0.f };
+    float32x4_t out_vec0 = { 0.f };
+    float32x4_t out_vec1 = { 0.f };
+    float32x4_t out_vec2 = { 0.f };
+    float32x4_t out_vec3 = { 0.f };


I fear this affects initialization. Are we sure all values are initialized correctly for all compilers?

jdemel · 2024-09-13T02:49:46Z

kernels/volk/volk_32f_index_min_32u.h

+#ifdef _MSC_VER
+        } else if (minValues.n128_f32[number] == min) {
+#else
        } else if (minValues[number] == min) {
+#endif


This seems like smth very special. Could you add a comment with some context? Maybe a link that explains the necessary change?

Windows: ARM64/NEON Support

51c2518

Signed-off-by: Jamie Vital <[email protected]>

marcusmueller reviewed Sep 5, 2024

View reviewed changes

jdemel requested changes Sep 13, 2024

View reviewed changes

JVital2013 mentioned this pull request Sep 29, 2024

Add Windows for ARM64 builds and implement RAW_IO in libs AlexandreRouma/SDRPlusPlus#1483

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows: ARM64/NEON Support #769

Windows: ARM64/NEON Support #769

JVital2013 commented Sep 4, 2024

marcusmueller Sep 5, 2024

marcusmueller Sep 5, 2024

jdemel left a comment •

edited

Loading

jdemel Sep 13, 2024

jdemel Sep 13, 2024

jdemel Sep 13, 2024

jdemel Sep 13, 2024

jdemel Sep 13, 2024

jdemel Sep 13, 2024

		int32x4_t toint_a = { 0, 0, 0, 0 }, toint_b = { 0, 0, 0, 0 };
		int32x4_t toint_a = { 0 }, toint_b = { 0 };

Windows: ARM64/NEON Support #769

Are you sure you want to change the base?

Windows: ARM64/NEON Support #769

Conversation

JVital2013 commented Sep 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdemel left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdemel left a comment •

edited

Loading