C++ API to CUDA C Runtime
The idea behind this project is to provide safety mechanisms and patterns inherent to C++ and to provide a more natural interface to those who have become more comfortable with C++. If you have a C++ project and you want to use CUDA, you'll probably end up implementing a lot of this boilerplate code already.
This is the first C++ pattern that jumps out with a clear use case in CUDA.
For the unfamiliar RAII stands for Resource Allocation Is Initialization. I prefer Constructor Allocates, Destructor Releases. In CUDA memory management, as in most C libraries, for every cudaMalloc you have to call cudaFree once and only once. The pattern is simple - call cudaMalloc in an object's constructor and cudaFree in the object's destructor. This way you're guaranteed to pair frees with mallocs and never have to worry about attempting to access memory that's already been freed. If the object is in scope then the resource is valid.
This may be a common C pattern, but I see a gratuitous use of 'int' when other types might be more appropriate, e.g. bool and unsigned. One potential reason to use int is to indicate failure or "not set." I prefer std::optional<>.
First step is to install CUDA and then cmake.
Those are the only dependencies beyond the standard C++ toolchain.
The checkout and build, including running tests, is pretty standard:
git clone [email protected]:olivas/cudapp.git
cd cudapp
mkdir build
cd build
cmake ..
make
make test
You should see something similar to the following output:
Running tests...
Test project /home/olivas/cudapp/build
Start 1: test_device
1/3 Test #1: test_device ...................... Passed 0.35 sec
Start 2: test_device_manager
2/3 Test #2: test_device_manager .............. Passed 0.09 sec
Start 3: test_device_properties
3/3 Test #3: test_device_properties ........... Passed 0.08 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 0.53 sec
The first thing you might want to do is see how many NVIDIA cards you have installed and check their properties.
Notes: The installation is a little janky. You have to install libtensor1 first, otherwise the cuQuantum installer will complain that you don't have an installable version. This gets you unstuck, but to actually run you then need to install libtensor2 otherwise ld will complain that it can't find libcutensor.so.2
Running hello_quda:
cuTensorNet version: 20400
===== device info ======
GPU-name:NVIDIA GeForce RTX 3060 Ti
GPU-clock:1695000
GPU-memoryClock:7001000
GPU-nSM:38
GPU-major:8
GPU-minor:6
========================
Included headers and defined data types
Bob's your uncle.