CCCL 2.2.0
(Note that these release notes are not yet finalized. They do not reflect any PRs that were merged to Thrust/CUB/libcudacxx before migrating to the nvidia/cccl repo).
What's Changed
- Add axis for docker builds by @raydouglass in #1
- Docker: Add support for ICPC and NVC++, install newer CMake, and add curl by @brycelelbach in #4
- Update excludes by @raydouglass in #5
- Docker: OS and CUDA upgrades, support for additional configurations by @brycelelbach in #9
- Docker: Add Thrust/CUB documentation toolchain to Ubuntu docker images by @brycelelbach in #15
- Re-enable CentOS images. by @allisonvacanti in #16
- Add sccache to dockerfile by @msadang in #17
- Update base containers. by @allisonvacanti in #18
- Update
sccache
version by @ajschmidt8 in #19 - Build
11.5.1
containers by @ajschmidt8 in #20 - Add ops-bot.yaml by @jrhemstad in #80
- Monorepo workflow by @jrhemstad in #99
- Add devcontainers by @jrhemstad in #105
- Update the libcu++ submodule by @miscco in #109
- Update libcudaxx again by @miscco in #110
- Remove submodules from CI workflow by @jrhemstad in #115
- Fix CUB CI by @senior-zero in #114
- Fix async scan / counting iterator tests by @senior-zero in #118
- Make sccache work locally by @jrhemstad in #113
- Fix compilation of thrust and cub by @miscco in #120
- Fix segfault in cub::CachingDeviceAllocator by @senior-zero in #119
- Initial GH Infra Setup by @jarmak-nv in #23
- Visualize variant space coverage by @senior-zero in #125
- Fix broken issue templates by @jarmak-nv in #124
- Tune scan by key for SM90 by @senior-zero in #121
- Update PR template to more explicitly prompt for a linked issue closed by the PR by @jrhemstad in #134
- Change component section to more general "area" by @jrhemstad in #132
- Try and fix CI for old CTK by @miscco in #116
- Fix
tuple_cat
forstd::
qualified types by @miscco in #144 - Add ccache to lit invocation by @miscco in #147
- Benchmark batched memcpy by @senior-zero in #136
- Properly querry
CMAKE_CUDA_COMPILER_LAUNCHER
for ccache support by @miscco in #152 - Implement Three-Way Partition Tuning / Benchmark by @senior-zero in #155
- Port three-way partition to use Catch2 by @senior-zero in #156
- Add gcc-6 to the test matrix by @miscco in #160
- Tune reduce / unique by key for SM90 by @senior-zero in #163
- Remove unused folders by @miscco in #145
- Fix documentation of
atomic_ref
by @miscco in #164 - New iterator traits by @miscco in #158
- Improve implementation of
destructible
by @miscco in #157 - Build script improvements by @jrhemstad in #149
- Fix icpc / denormals by @senior-zero in #185
- Enable tests by @jrhemstad in #167
- Monorepo by @jrhemstad in #194
- Multi-benchmark tuning by @senior-zero in #208
- Fixes universal_vector test failure on CTK 11.1 & gcc-6 by @elstehle in #209
- Delete several directories for older CI infra. by @wmaxey in #218
- Memory-safe radix sort test by @senior-zero in #222
- [FEA] Implement
iter_move
CPO by @miscco in #197 - Build cub benchmarks in build_cub.sh by @jrhemstad in #216
- [skip-tests] Do not run tests when
skip-tests
is part of the latest commit message by @miscco in #224 - Factor out build job logic into a "run-as-coder" reusable workflow. by @jrhemstad in #205
- Fix instances of 'scan' copy-pasted into reduction documentation by @milesvant in #221
- Add clangd to devcontainer by @senior-zero in #225
- Add initial CODEOWNERS file by @jrhemstad in #226
- Attempt to fix codeowners by @jrhemstad in #231
- Make libcudacxx respect CMake options for CUDA archs. by @wmaxey in #235
- Optimize Three-Way Partition by @senior-zero in #228
- [BUG] Rework how we handle feature test macros by @miscco in #195
- Enable use of
cudaMemcpyAsync
forthrust::copy
by @miscco in #211 - Enable additional arguments in build_common.sh by @wmaxey in #236
- [BUG] Properly uglify all qualifiers in product headers by @miscco in #201
- Port
cub::Device{Select, Partition}
tests to catch2 by @miscco in #229 - Fix CUB tests / MSVC 2022 by @senior-zero in #255
- Ensure that any CMake re-rooting doesn't break our find_file by @miscco in #257
- [BUG] Fix compilation issues with MSVC 2017 by @miscco in #196
- Implement iterator concepts by @miscco in #223
- Tune Histogram on H100 by @senior-zero in #266
- Add WarpExchangeAlgorithm customization for WarpExchange class by @pb-dseifert in #256
- [BUG]: Avoid deprecation warning for
std::aligned_storage
when building with c++23 by @miscco in #258 - Port cub::DeviceReduce tests to catch2 by @elstehle in #267
- Add support for nvcc-specific matrix. by @jrhemstad in #243
- Fix anchor link to cooperative groups in CUDA programming guide by @wence- in #274
- Fix BibTeX syntax in CITATION.md [skip-tests] by @wence- in #276
- Enforce C++17 for benches by @senior-zero in #275
- Project Automation: Move PR and Linked Issues to In Progress by @jarmak-nv in #170
- Update to 23.08 devcontainers and CUDA 12.2 by @jrhemstad in #270
- [skip-tests] CTK 12.2 tuning image by @senior-zero in #282
- Fix single-thread block reduction by @senior-zero in #287
- Tune Select and Partition on A100 by @senior-zero in #289
- Fix CUB tests / MSVC by @senior-zero in #292
- Allow building CUB tests without cuRand by @senior-zero in #250
- Fixup to CUB build - s/curand/cudart/ by @wmaxey in #301
- Fix OOB in
cub::DeviceRunLengthEncode::NonTrivialRuns
by @senior-zero in #294 - Tune RLE on A100 by @senior-zero in #295
- Tune scan on A100 by @senior-zero in #302
- Add new CCCL:: CMake targets by @allisonvacanti in #244
- Fix
cudacc
andnvcc
mixup. by @wmaxey in #329 - [skip-tests] Use builtin for
destructible
concept on MSVC by @miscco in #333 - Fix merge conflict from two inflight PRs by @miscco in #338
New Contributors
- @raydouglass made their first contribution in #1
- @brycelelbach made their first contribution in #4
- @msadang made their first contribution in #17
- @wmaxey made their first contribution in #218
- @milesvant made their first contribution in #221
- @pb-dseifert made their first contribution in #256
- @wence- made their first contribution in #274
Full Changelog: https://github.com/NVIDIA/cccl/commits/v2.2.0