MSCCL++ v0.5.0
What's Changed
- Fix a typo name by @chhwang in #286
- Add executor to execute schedule-plan file by @Binyang2014 in #283
- Allow binding allocated memory to NVLS multicast pointer by @roshandathathri in #290
- Seperate headers for GPU data types by @chhwang in #291
- Refactoring NVLS interfaces by @chhwang in #293
- Include GPU data types only for kernel code by @chhwang in #292
- Ethernet support by @chhwang in #284
- Resolve multi-nodes test failure issue by @Binyang2014 in #295
- Move pipeline to Azure org by @Binyang2014 in #296
- Optimized the execution kernel by @Binyang2014 in #294
- Allow obtaining cuda stream handle from PyTorch stream when launching kernel by @aashaka in #297
- v0.5.0 by @chhwang in #298
New Contributors
- @roshandathathri made their first contribution in #290
Full Changelog: v0.4.3...v0.5.0