Skip to content

Commit

Permalink
v0.5.0 (#183)
Browse files Browse the repository at this point in the history
  • Loading branch information
chhwang authored Dec 16, 2023
1 parent 97cd329 commit 1762798
Show file tree
Hide file tree
Showing 11 changed files with 39 additions and 33 deletions.
13 changes: 13 additions & 0 deletions .vscode/c_cpp_properties.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"configurations": [
{
"name": "Linux",
"includePath": [
"${workspaceFolder}/**",
"/usr/local/cuda/include",
"/opt/rocm/include"
]
}
],
"version": 4
}
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cff-version: 1.2.0
title: "ARK: A GPU-driven system framework for scalable AI applications"
version: 0.4.1
version: 0.5.0
message: >-
If you use this project in your research, please cite it as below.
authors:
Expand Down
4 changes: 2 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
# Licensed under the MIT license.

set(ARK_MAJOR "0")
set(ARK_MINOR "4")
set(ARK_PATCH "1")
set(ARK_MINOR "5")
set(ARK_PATCH "0")

set(ARK_VERSION "${ARK_MAJOR}.${ARK_MINOR}.${ARK_PATCH}")
set(ARK_SOVERSION "${ARK_MAJOR}.${ARK_MINOR}")
Expand Down
19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ A GPU-driven system framework for scalable AI applications.
| Unit Tests (ROCm) | [![Unit Tests (ROCm)](https://github.com/microsoft/ark/actions/workflows/ut-rocm.yml/badge.svg?branch=main)](https://github.com/microsoft/ark/actions/workflows/ut-rocm.yml) |

*NOTE (Nov 2023): ROCm unit tests will be replaced into an Azure pipeline in the future.*

*NOTE (Dec 2023): ROCm unit tests are failing due to the nodes' issue. This will be fixed soon.*

See [Quick Start](docs/quickstart.md) to quickly get started.
Expand All @@ -29,18 +30,20 @@ ARK provides a set of APIs for users to express their distributed deep learning

ARK is under active development and a part of its features will be added in a future release. The following describes key features of each version.

### New in ARK v0.4 (Latest Release)
### New in ARK v0.5 (Latest Release)

* Support AMD GPUs (CDNA2, single-GPU only)
* Add high-performance AllReduce & AllGather algorithms with MSCCL++
* Fix major bugs in the scheduler
* Integrate with [MSCCL++](https://github.com/microsoft/mscclpp)
* Removed dependency on `gpudma`
* Add AMD CDNA3 architecture support
* Support communication for AMD GPUs
* Optimize OpGraph scheduling
* Add a multi-GPU Llama2 example

See details from https://github.com/microsoft/ark/issues/137.
See details from https://github.com/microsoft/ark/issues/168.

### ARK v0.5 (TBU, Dec. 2023)
### ARK v0.6 (TBU, Jan. 2024)

* Multi-GPU support for AMD GPUs
* Add multi-GPU LLM examples
* Overall performance optimization
* Improve Python unit tests & code coverage

## Contributing
Expand Down
2 changes: 1 addition & 1 deletion ark/gpu/gpu_mem.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ GpuMem::GpuMem(size_t bytes) { this->init(bytes); }
GpuMem::GpuMem(const GpuMem::Info &info) { this->init(info); }

//
void GpuMem::init(size_t bytes, bool expose) {
void GpuMem::init(size_t bytes, [[maybe_unused]] bool expose) {
if (bytes == 0) {
ERR(InvalidUsageError, "Tried to allocate zero byte.");
}
Expand Down
4 changes: 2 additions & 2 deletions ark/include/ark.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
#include <vector>

#define ARK_MAJOR 0
#define ARK_MINOR 4
#define ARK_PATCH 1
#define ARK_MINOR 5
#define ARK_PATCH 0
#define ARK_VERSION (ARK_MAJOR * 10000 + ARK_MINOR * 100 + ARK_PATCH)

namespace ark {
Expand Down
4 changes: 2 additions & 2 deletions cmake/Utils.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ endif()
find_program(BLACK black)
if(BLACK)
add_custom_target(pylint
COMMAND python3.8 -m black --check --config ${PROJECT_SOURCE_DIR}/pyproject.toml ${PROJECT_SOURCE_DIR}
COMMAND python3 -m black --check --config ${PROJECT_SOURCE_DIR}/pyproject.toml ${PROJECT_SOURCE_DIR}
)
add_custom_target(pylint-autofix
COMMAND python3.8 -m black --config ${PROJECT_SOURCE_DIR}/pyproject.toml ${PROJECT_SOURCE_DIR}
COMMAND python3 -m black --config ${PROJECT_SOURCE_DIR}/pyproject.toml ${PROJECT_SOURCE_DIR}
)
else()
message(STATUS "black not found.")
Expand Down
16 changes: 3 additions & 13 deletions docs/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,12 @@

## Prerequisites

* Linux kernel >= 4.15.0

- If you have a lower version, you can upgrade it via:
```bash
sudo apt-get update
sudo apt-get install -y linux-image-4.15.0-13-generic linux-header-4.15.0-13-generic
```

* CMake >= 3.25.0 and Python >= 3.8

* Supported GPUs
- NVIDIA GPUs: Volta (CUDA >= 11.1) / Ampere (CUDA >= 11.1) / Hopper (CUDA >= 12.0)
- Hopper support will be added in the future.
- AMD GPUs: CDNA2 (ROCm >= 5.0) / CDNA3
- Multi-GPU execution is not yet supported for AMD GPUs and will be supported by a future release.
- CDNA3 support will be added in the future.
- AMD GPUs: CDNA2 (ROCm >= 5.7) / CDNA3 (ROCm >= 5.7)

* Mellanox OFED

Expand All @@ -28,9 +18,9 @@ We currently provide only *base images* for ARK, which contain all the dependenc
You can pull a base image as follows.
```
# For NVIDIA GPUs
docker pull ghcr.io/microsoft/ark/ark:base-dev-cuda12.1
docker pull ghcr.io/microsoft/ark/ark:base-dev-cuda12.2
# For AMD GPUs
docker pull ghcr.io/microsoft/ark/ark:base-dev-rocm5.6
docker pull ghcr.io/microsoft/ark/ark:base-dev-rocm5.7
```

Check [ARK containers](https://github.com/microsoft/ark/pkgs/container/ark%2Fark) for all available Docker images.
Expand Down
4 changes: 2 additions & 2 deletions docs/sphinx/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
project = "ARK"
copyright = "2023, ARK Team"
author = "ARK Team"
version = "0.4.1"
release = "0.4.1"
version = "0.5.0"
release = "0.5.0"

# -- General configuration ---------------------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "scikit_build_core.build"

[project]
name = "ark"
version = "0.4.1"
version = "0.5.0"

[tool.scikit-build]
cmake.minimum-version = "3.25"
Expand Down
2 changes: 1 addition & 1 deletion third_party/mscclpp
Submodule mscclpp updated 1 files
+11 −3 CMakeLists.txt

0 comments on commit 1762798

Please sign in to comment.