PPO

Implementation of a multi-processed Proximal Policy Optimization (PPO) with some "implementation tricks" from the article What Matters In On-Policy Reinforcement Learning?

It achieves a score of around 300 (considered solved) on the BipedalWalker-v2 environment from OpenAI:

Added the following features:

Multi-processed workers for experience collection (resulting in 2-3x faster wall clock time).
Model loading to allow warmstarting or continuing from a checkpoint.
Proper seeding resulting in fully reproducible runs.
Tensorboard logging.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
bipedal_iteration250.gif		bipedal_iteration250.gif
evaluation.ipynb		evaluation.ipynb
model.py		model.py
ppo.py		ppo.py
training.ipynb		training.ipynb
utils.py		utils.py

Provide feedback