Release Alpha v0.2 · CarperAI/trlx

Complete revamp of our initial release.

New features:

Hydra models, 20x faster than vanilla PPO with minimal performance hits at large scales
Massively revamped API, significantly less boiler plate.
Save/load callbacks.
Greatly improved orchestrator.
Better commented RL code, easier to understand whats going on.
Cool examples, including architext and simulacra.
Better extendability, and standardized styling.

Features coming soon:

Autogenerated release notes below:

What's Changed

Fix typo by @mrm8488 in #2
Create LICENSE by @LouisCastricato in #3
QOL fixes by @LouisCastricato in #5
stage ilql by @reciprocated in #6
Adds style file and reward function capabilities to ppo orchestrator by @LouisCastricato in #8
Update ppo value head + print logs by @Dahoas in #11
Make ilql respect the config & remove sin by @reciprocated in #22
Docs by @shahbuland in #31
Implemented hydra heads + adaptive kl by @Dahoas in #33
Add pre-commit with black by @cat-state in #36
[update] Improve package setup by @jon-tow in #42
Add initial issue templates by @jon-tow in #45
Some readme improvements by @thedch in #44
Add initial GitHub workflows by @jon-tow in #43
[docs] Add CONTRIBUTING.md by @jon-tow in #52
Simplify api by @reciprocated in #24

Full Changelog: https://github.com/CarperAI/trlx/commits/v0.2