This code demonstrates the usage of single node, multiGPU cuFFT R2C:C2R functionality, on a 3D dataset. Use single gpu version as reference. Maximum FFT size limited by single GPU memory.
Note, because cuFFT 10.4.0 cufftSetStream can be used to associate a stream with multi-GPU plan. cufftXtExecDescriptor synchronizes efficiently to the stream before and after execution. Please refer to https://docs.nvidia.com/cuda/cufft/index.html#function-cufftsetstream for more information.
cuFFT by default executes multi-GPU plans in synchronous manner.
The cuFFT library doesn't guarantee that single-GPU and multi-GPU cuFFT plans will perform mathematical operations in same order. Small numerical differences are possible.
All GPUs supported by CUDA Toolkit (https://developer.nvidia.com/cuda-gpus)
Linux
Windows
x86_64
ppc64le
arm64-sbsa
- cufftXtSetGPUs API
- cufftMakePlan3d API
- cufftXtExecDescriptor API
- cufftXtMalloc API
- cufftXtMemcpy API
- A Linux/Windows system with recent NVIDIA drivers.
- CMake version 3.18 minimum
$ mkdir build
$ cd build
$ cmake ..
$ make
Make sure that CMake finds expected CUDA Toolkit. If that is not the case you can add argument -DCMAKE_CUDA_COMPILER=/path/to/cuda/bin/nvcc
to cmake command.
$ ./bin/3d_mgpu_r2c_c2r_example
Sample example output:
PASSED with L2 error = 0