Alpha v0.2
Pre-release
Pre-release
Complete revamp of our initial release.
New features:
- Hydra models, 20x faster than vanilla PPO with minimal performance hits at large scales
- Massively revamped API, significantly less boiler plate.
- Save/load callbacks.
- Greatly improved orchestrator.
- Better commented RL code, easier to understand whats going on.
- Cool examples, including architext and simulacra.
- Better extendability, and standardized styling.
Features coming soon:
- Megatron support! we're already working on this.
- More interesting examples that are relevant to production use cases of TRLX.
- Better integration of W&B, including sweeps.
- Evaluation and benchmarking.
:)
Autogenerated release notes below:
What's Changed
- Fix typo by @mrm8488 in #2
- Create LICENSE by @LouisCastricato in #3
- QOL fixes by @LouisCastricato in #5
- stage ilql by @reciprocated in #6
- Adds style file and reward function capabilities to ppo orchestrator by @LouisCastricato in #8
- Update ppo value head + print logs by @Dahoas in #11
- Make ilql respect the config & remove sin by @reciprocated in #22
- Docs by @shahbuland in #31
- Implemented hydra heads + adaptive kl by @Dahoas in #33
- Add pre-commit with
black
by @cat-state in #36 - [update] Improve package setup by @jon-tow in #42
- Add initial issue templates by @jon-tow in #45
- Some readme improvements by @thedch in #44
- Add initial GitHub workflows by @jon-tow in #43
- [docs] Add
CONTRIBUTING.md
by @jon-tow in #52 - Simplify api by @reciprocated in #24
New Contributors
- @mrm8488 made their first contribution in #2
- @LouisCastricato made their first contribution in #3
- @reciprocated made their first contribution in #6
- @Dahoas made their first contribution in #11
- @shahbuland made their first contribution in #31
- @cat-state made their first contribution in #36
- @jon-tow made their first contribution in #42
- @thedch made their first contribution in #44
Full Changelog: https://github.com/CarperAI/trlx/commits/v0.2