You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Including <cuda/ptx> takes ~800ms on my workstation.
How to Reproduce
Comparing the time to compile an empty file, a file including cuda/ptx and a file including cuda/std/__type_traits/integral_constant.h (which is included from cuda/ptx).
$ echo "" > empty.cu
$ echo "#include <cuda/ptx>" > cuda_ptx.cu
$ echo "#include <cuda/std/__type_traits/integral_constant.h>" > cuda_std_integral_constant.cu
$ hyperfine --warmup 1 'nvcc -arch sm_90a -x cu -c empty.cu -o test.o' 'nvcc -arch sm_90a -x cu -c cuda_ptx.cu -o test.o' 'nvcc -arch sm_90a -x cu -c cuda_std_integral_constant.cu -o test.o'
Benchmark 1: nvcc -arch sm_90a -x cu -c empty.cu -o test.o
Time (mean ± σ): 1.434 s ± 0.012 s [User: 1.070 s, System: 0.368 s]
Range (min … max): 1.414 s … 1.455 s 10 runs
Benchmark 2: nvcc -arch sm_90a -x cu -c cuda_ptx.cu -o test.o
Time (mean ± σ): 2.299 s ± 0.022 s [User: 1.861 s, System: 0.442 s]
Range (min … max): 2.275 s … 2.339 s 10 runs
Benchmark 3: nvcc -arch sm_90a -x cu -c cuda_std_integral_constant.cu -o test.o
Time (mean ± σ): 2.131 s ± 0.023 s [User: 1.709 s, System: 0.426 s]
Range (min … max): 2.098 s … 2.167 s 10 runs
Expected behavior
This should not be a heavy header.
Reproduction link
No response
Operating System
Ubuntu Linux 22.04
nvidia-smi output
NA
NVCC version
Benchmark was performed using prerelease version of nvcc, but should be reproducible with any recent version.
The text was updated successfully, but these errors were encountered:
Yep, looks like the extended FP type headers are quite expensive, but since they are included as part of the CCCL config, they will affect each translation unit. @miscco could we consider only defining _CCCL_HAS_NVFP16 and _CCCL_HAS_NVBF16 in the CCCL config headers and leaving it up to downstream libraries and users to include the corresponding headers themselves?
Is this a duplicate?
Type of Bug
Performance
Component
libcu++
Describe the bug
Including
<cuda/ptx>
takes ~800ms on my workstation.How to Reproduce
Comparing the time to compile an empty file, a file including
cuda/ptx
and a file includingcuda/std/__type_traits/integral_constant.h
(which is included fromcuda/ptx
).Expected behavior
This should not be a heavy header.
Reproduction link
No response
Operating System
Ubuntu Linux 22.04
nvidia-smi output
NA
NVCC version
Benchmark was performed using prerelease version of nvcc, but should be reproducible with any recent version.
The text was updated successfully, but these errors were encountered: