You may want to accelerate TIGRE more, and are wondering how to do so. This File gives advice on what to change in the source files of TIGRE in order to tune it to your specific computer.
Instructions are given for MATLAB, but are easily adaptable to Python.
We highly suggest you start with the first suggestion. It means 0.3s per operation, considering that a backprojection of 512^3 with projections of size 1000^2 x 500 take approximately 1s, 0.3s from that is a significant acceleration, it can be over 50% in some cases.
The other suggestions may not improve your performance, or improve it very slightly.
Current values in the source code are for GTX 1080 Ti.
All .cu files have a function called checkFreeMemory
defined in the bottom of the file.
This function checks each available GPU and queries how much free memory there is available.
The function is fundamental for the correct splitting without errors of the problem.
However, if you know how much memory (in bytes) your GPUs have, you can replace this function by just a constant number, in bytes, of the amount of free memory your GPU has. Consider giving it only 95% of the actual free memory, and consider that other processes (such as your desktop, internet browser) may be using the GPU too. Adding a number greater than the available memory will cause the code to error.
Just comment all the lines inside the function and add
*mem_GPU_global= your number in bytes;
e.g. for a GTX 1080 Ti, with 11 GB of DRAM, this value is around 11721500000. We recommend using less than 95% of this value, something like 11000000000. Lower if other processes are also using your GPUs.
Compile again, and you save up to 0.3s per GPU, per CUDA call. This can be significant (512^3 image, 512^2 projections 512 angles take around this time to backproject).
Expected acceleration is not more than 5%, likely 0%. So change it if time is really critical to you.
-
Define your expected geometry.
-
Create a sample image using
head=headPhantom(geo.nVoxel)+rand(geo.nVoxel.')*0.2
-
Time it using
tic; tv=im3DDenoise(head,'TV',150,5); toc
-
Open
Source/tvDenoise.cu
, lines 54,55 should defineMAX_BUFFER
andBLOCK_SIZE
. -
Modify
MAX_BUFFER
by steps of size 10, modifyBLOCK_SIZE
by steps of size 2.MAX_BUFFER
is expected to have a bigger influence. -
Recompile and time. Find the appropriate
MAX_BUFFER
andBLOCK_SIZE
for your GPU, System and image size
Expected acceleration is not more than 10%, likely 0%. So change it if time is really critical to you. The steps show how to improve FDK backprojection, similar steps are needed for matched backprojector.
-
Define your expected geometry.
-
Create a sample projection set, geometry and angles.`
-
Time it using
tic; bp=Atb(projection,geo,angles); toc
-
Open
Source/voxel_backprojection.cu
, lines 107,108 should definePROJ_PER_KERNEL
andVOXEL_PER_THREAD
. -
Modify
PROJ_PER_KERNEL
andVOXEL_PER_THREAD
by steps of size 1.PROJ_PER_KERNEL*PROJ_PER_KERNEL*VOXEL_PER_THREAD
must not exceed your maximum number of threads, likely 1024 -
Recompile and time again. Find the appropriate
PROJ_PER_KERNEL
andVOXEL_PER_THREAD
for your GPU, System and image size
Expected acceleration is not more than 10%, likely 0%. So change it if time is really critical to you. The steps show how to improve FDK backprojection, similar steps are needed for matched backprojector.
-
Define your expected geometry.
-
Create a sample image, geometry and angles.`
-
Time it using
tic; fp=Ax(image,geo,angles); toc
-
Open
Source/siddon_projection.cu
, lines 107,108 should definePROJ_PER_KERNEL
andVOXEL_PER_THREAD
. -
Modify
PROJ_PER_KERNEL
andVOXEL_PER_THREAD
by steps of size 1.PROJ_PER_KERNEL*PROJ_PER_KERNEL*VOXEL_PER_THREAD
must not exceed your maximum number of threads, likely 1024 -
Recompile and time again. Find the appropriate
PROJ_PER_KERNEL
andVOXEL_PER_THREAD
for your GPU, System and image size