parallel-trpo

Acknowledge

Citation

@article{DBLP:journals/corr/SchulmanLMJA15,
  author    = {John Schulman and
               Sergey Levine and
               Philipp Moritz and
               Michael I. Jordan and
               Pieter Abbeel},
  title     = {Trust Region Policy Optimization},
  journal   = {CoRR},
  volume    = {abs/1502.05477},
  year      = {2015},
  url       = {http://arxiv.org/abs/1502.05477},
  timestamp = {Wed, 07 Jun 2017 14:42:34 +0200},
  biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/SchulmanLMJA15},
  bibsource = {dblp computer science bibliography, http://dblp.org}
}

Contributions

Update for TensorFlow 1.3
Fix some bugs. In file main.py, Add

history["maxkl"] = []

About

A parallel implementation of Trust Region Policy Optimization (TRPO) on environments from OpenAI Gym.

Now includes hyperparaemter adaptation as well! For more info, check Kevin Frans' post on this project.

Kevin Frans is working towards the ideas at this openAI research request. The code is based off of this implementation.

Kevin Frans is currently working together with Danijar on writing an updated version of this preliminary paper, describing the multiple actors setup.

How to run:

# This just runs a simple training on Reacher-v1.
python main.py

# For the commands used to recreate results, check trials.txt

Parameters:

--task: what gym environment to run on
--timesteps_per_batch: how many timesteps for each policy iteration
--n_iter: number of iterations
--gamma: discount factor for future rewards_1
--max_kl: maximum KL divergence between new and old policy
--cg_damping: damp on the KL constraint (ratio of original gradient to use)
--num_threads: how many async threads to use
--monitor: whether to monitor progress for publishing results to gym or not

Requirements

TensorFlow >= 1.3
Python 2.7

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
results		results
too-long-trials		too-long-trials
.gitignore		.gitignore
LICENSE		LICENSE
MUJOCO_LOG.TXT		MUJOCO_LOG.TXT
README.md		README.md
main.py		main.py
model.py		model.py
rollouts.py		rollouts.py
trials.txt		trials.txt
trials_old.txt		trials_old.txt
utils.py		utils.py
value_function.py		value_function.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parallel-trpo

Acknowledge

Citation

Contributions

About

Requirements

About

Releases

Packages

Languages

License

Jiankai-Sun/Trust-Region-Policy-Optimization-TensorFlow

Folders and files

Latest commit

History

Repository files navigation

parallel-trpo

Acknowledge

Citation

Contributions

About

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages