Skip to content

Commit

Permalink
docs: rewrite readme
Browse files Browse the repository at this point in the history
  • Loading branch information
be-marc committed Oct 17, 2023
1 parent 7b8631e commit f2b96eb
Show file tree
Hide file tree
Showing 5 changed files with 139 additions and 109 deletions.
68 changes: 39 additions & 29 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,45 @@ output: github_document

# rush

Rush is a package for parallel and distributed computing in R.
It parallelizes the evaluation of R functions on a cluster of workers and provides a shared storage between the workers.
*rush* is a package for parallel and distributed computing in R.
It evaluates an R expression asynchronously on a cluster of workers and provides a shared storage between the workers.
The shared storage is a [Redis](https://redis.io) data base.
Rush offers the option to define a single manager that distributes tasks to the workers.
Alternatively, the workers can create tasks themselves and communicate the results with each other via Rush.
Rush offers a centralized and decentralized network architecture.
The centralized network has a single controller (`Rush`) and multiple workers (`RushWorker`).
Tasks are created centrally and distributed to workers by the controller.
The decentralized network has no controller.
The workers sample tasks and communicate the results asynchronously with other workers.

# Features

* Parallelize arbitrary R expressions.
* Centralized and decentralized network architecture.
* Small overhead of a few milliseconds per task.
* Easy starting of workers with the [`future`](https://future.futureverse.org/) package.
* Start workers on any platform with a batch script.
* Designed to work with [`data.table`](https://cran.r-project.org/web/packages/data.table/index.html).
* Results are cached in the R session to minimize read and write operations.
* Detect and recover from worker failures.
* Start heartbeats to monitor workers on remote machines.
* Snapshot the in-memory data base to disk.
* Store [`lgr`](https://cran.r-project.org/web/packages/lgr/index.html) messages of the workers in the Redis data base.
* Light on dependencies.

![](man/figures/README-flow.png)
## Install

Single manager with multiple workers strategy.
Install the development version from GitHub.

## Install
```{r eval = FALSE}
remotes::install_github("mlr-org/rush")
```

And install [Redis](https://redis.io/docs/getting-started/installation/).

[Install Redis](https://redis.io/docs/getting-started/installation/)
## Centralized Rush Network

## Example
![](man/figures/README-flow.png)

*Centralized network with a single controller and three workers.*

```{r, include=FALSE}
future::plan("sequential")
Expand All @@ -29,13 +53,12 @@ r = redux::hiredis(config)
r$FLUSHDB()
```

Initialize the rush controller instance.
The `instance_id` identifies the instance and worker in the network.
The example below shows the evaluation of a simple function in a centralized network.
The `instance_id` identifies the instance and workers in the network.
The `config` is a list of parameters for the connection to Redis.

```{r}
library(rush)
library(redux)
config = redux::redis_config()
rush = Rush$new(instance_id = "test", config)
Expand All @@ -51,7 +74,7 @@ fun = function(x1, x2, ...) {
}
```

We start two worker with the [`future`](https://future.futureverse.org/) package.
We start two workers with the [`future`](https://future.futureverse.org/) package.

```{r}
future::plan("multisession", workers = 2)
Expand All @@ -73,21 +96,8 @@ And retrieve the results.
rush$fetch_finished_tasks()
```

## Decentralized Rush Network

## Task States

Tasks have four states: `queued`, `running`, `finished`, `failed`.

* `queued` tasks are in the wait list.
* `running` tasks are evaluated on a worker.
* `finished` tasks pushed their result to the data base.
* `failed` tasks threw an error.

## Worker States

Workers have four states: `running`, `terminated`, `killed`, `lost`.
![](man/figures/README-flow-2.png)

* `running` workers are evaluating tasks.
* `terminated` workers are stopped.
* `killed` workers were killed by the user.
* `lost` workers crashed.
*Decentralized network with four workers.*
91 changes: 47 additions & 44 deletions README.html

Large diffs are not rendered by default.

89 changes: 53 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,60 @@

# rush

Rush is a package for parallel and distributed computing in R. It
parallelizes the evaluation of R functions on a cluster of workers and
*rush* is a package for parallel and distributed computing in R. It
evaluates an R expression asynchronously on a cluster of workers and
provides a shared storage between the workers. The shared storage is a
[Redis](https://redis.io) data base. Rush offers the option to define a
single manager who distributes tasks to the workers. Alternatively, the
workers can create tasks themselves and communicate the results with
each other via Rush.
[Redis](https://redis.io) data base. Rush offers a centralized and
decentralized network architecture. The centralized network has a single
controller (`Rush`) and multiple workers (`RushWorker`). Tasks are
created centrally and distributed to workers by the controller. The
decentralized network has no controller. The workers sample tasks and
communicate the results asynchronously with other workers.

# Features

- Parallelize arbitrary R expressions.
- Centralized and decentralized network architecture.
- Small overhead of a few milliseconds per task.
- Easy starting of workers with the
[`future`](https://future.futureverse.org/) package.
- Start workers on any platform with a batch script.
- Designed to work with
[`data.table`](https://cran.r-project.org/web/packages/data.table/index.html).
- Results are cached in the R session to minimize read and write
operations.
- Detect and recover from worker failures.
- Start heartbeats to monitor workers on remote machines.
- Snapshot the in-memory data base to disk.
- Store
[`lgr`](https://cran.r-project.org/web/packages/lgr/index.html)
messages of the workers in the Redis data base.
- Light on dependencies.

![](man/figures/README-flow.png)
## Install

Single manager with multiple workers strategy.
Install the development version from GitHub.

## Install
``` r
remotes::install_github("mlr-org/rush")
```

[Install Redis](https://redis.io/docs/getting-started/installation/)
And install
[Redis](https://redis.io/docs/getting-started/installation/).

## Example
## Centralized Rush Network

Initialize the rush controller instance. The `instance_id` identifies
the instance and worker in the network. The `config` is a list of
parameters for the connection to Redis.
![](man/figures/README-flow.png)

*Centralized network with a single controller and three workers.*

The example below shows the evaluation of a simple function in a
centralized network. The `instance_id` identifies the instance and
workers in the network. The `config` is a list of parameters for the
connection to Redis.

``` r
library(rush)
library(redux)

config = redux::redis_config()
rush = Rush$new(instance_id = "test", config)
Expand All @@ -51,8 +80,8 @@ fun = function(x1, x2, ...) {
}
```

We start two worker with the [`future`](https://future.futureverse.org/)
package.
We start two workers with the
[`future`](https://future.futureverse.org/) package.

``` r
future::plan("multisession", workers = 2)
Expand All @@ -75,26 +104,14 @@ rush$fetch_finished_tasks()
```

## x1 x2 pid worker_id y status
## 1: 4 6 189379 8219bdc4-a7e7-485e-b747-ba9ed83b8846 10 finished
## 2: 3 5 189380 d1719b60-d066-4fd9-93be-94dffcfb08a5 8 finished
## 1: 3 5 224861 aaa9bbea-ab25-4c47-a9ef-2cde95ee7144 8 finished
## 2: 4 6 234065 858f7aa4-18bd-48e0-a69f-0f0297a9051c 10 finished
## keys
## 1: acbf50d1-cafe-4628-a05d-a3ed72317aef
## 2: 452e5a1b-8019-4ec9-98fe-ebf537878762

## Task States

Tasks have four states: `queued`, `running`, `finished`, `failed`.

- `queued` tasks are in the wait list.
- `running` tasks are evaluated on a worker.
- `finished` tasks pushed their result to the data base.
- `failed` tasks threw an error.
## 1: c37c5467-693d-4df9-a1f6-fd2a5d0aaf65
## 2: 0bd23506-eecb-42ee-beba-a37be45b51b8

## Worker States
## Decentralized Rush Network

Workers have four states: `running`, `terminated`, `killed`, `lost`.
![](man/figures/README-flow-2.png)

- `running` workers are evaluating tasks.
- `terminated` workers are stopped.
- `killed` workers were killed by the user.
- `lost` workers crashed.
*Decentralized network with four workers.*
Binary file added man/figures/README-flow-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f2b96eb

Please sign in to comment.