The goal of this project is to train an agent to navigate and collect bananas within a square environment using the DQN algorithm (Deep Q-Network).
The agent receives a reward of +1 for every yellow banana and -1 for every blue banana that it collects. The goal is to collect the maximum amount of yellow bananas while avoiding the blue ones.
The state space has 37 dimensions, which contains the agent's velocity and ray-based perception of objects around the agent's forward direction. Each state space value ranges between 0 and 1. With this information the agent needs to learn the best four discrete actions:
0- move forward1- move backward2- turn left3- turn right
The task is episodic, and to solve the environment the agent must get an average score greater than 13 over 100 consecutive episodes.
You will need to set up your python environment.
-
Create (and activate) a new environment with Python 3.6.
- Linux or Mac:
conda create --name drlnd python=3.6 source activate drlnd- Windows:
conda create --name drlnd python=3.6 activate drlnd
-
Perform a minimal install of OpenAI
gymwith:
pip install gym
- Install the classic control environment group by following the instructions here
- Install the box2d environment group by following the instructions here
- Clone the Udacity's Deep Reinforcement Learning repository
git clone https://github.com/udacity/deep-reinforcement-learning.git
cd deep-reinforcement-learning/python
pip install .
- Create an IPython kernel for
drlndenvironment
python -m ipykernel install --user --name drlnd --display-name "drlnd"
-
Before running code in a notebook, change the kernel to
drlndenvironment by using the drop-downKernelmenu. -
For this project you will need to download the pre-built environment prepared by Udacity, and you can download it from one of the links below. You need to download the file that matches your operating system:
-
Download this repository within your working directory.
Follow the instruction in Navigation.ipynb to get started with training your agent!
Navigation.ipynbThis is the Jupyter notebook that contains the implementation of the DQN algorithm.dqn_agent.pyThis Python file contains two classes:AgentandReplayBuffer. TheAgentclass contains anactmethod used to return an action for a given state and current policy. It also has alearnmethod used to updated the Q-network parameters given a batch of experience tuples. TheReplayBufferclass has anaddmethod to add a new experience to the memory buffer, and asamplemethod used to randomly fetch a batch of experiences from memory.model.pyThis Python file contains the Q-Network model, which maps 37 input states to 4 action values. This neural network contains two hidden layers, each with 64 nodes. A ReLU activation function is used after the output of the first and second hidden layers, and an identity activation function for the output.checkpoint.pthThis file contains the DQN weights of the trained agent.
The source code is released under an MIT license.
I would like to thank the Udacity community for the technical support and for providing coding exercises that helped me understand the implementation of this algorithm.
Andres Campos
