Skip to content

Implementation of a multiprocessing Proximal Policy Optimization (PPO) algorithm on the BidepalWalker OpenAI Gym environment.

Notifications You must be signed in to change notification settings

MWeltevrede/PPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPO

Implementation of a multi-processed Proximal Policy Optimization (PPO) with some "implementation tricks" from the article What Matters In On-Policy Reinforcement Learning?

It achieves a score of around 300 (considered solved) on the BipedalWalker-v2 environment from OpenAI: Bipedal Walker GIF

Update July 2021

Added the following features:

  • Multi-processed workers for experience collection (resulting in 2-3x faster wall clock time).
  • Model loading to allow warmstarting or continuing from a checkpoint.
  • Proper seeding resulting in fully reproducible runs.
  • Tensorboard logging.

About

Implementation of a multiprocessing Proximal Policy Optimization (PPO) algorithm on the BidepalWalker OpenAI Gym environment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published