Reinforcement Learning for Navigation in Dynamic Maze Environments

Overview

This project implements a reinforcement learning (RL) navigation system where an agent learns to reach a goal in randomly generated maze environments with moving obstacles. The system starts with a tabular Q-learning baseline and is later upgraded to a Deep Q-Network (DQN) with stabilization techniques such as experience replay and a target network.

The objective is to study how an RL agent can learn goal-directed navigation in stochastic and dynamic environments, which is a core problem in robotics and autonomous systems. To stabilize learning and improve navigation performance, the environment has been recently tuned with custom obstacle density and slower obstacle dynamics, alongside optimized exploration rates.

Key Features

Random maze generation every episode
Reduced environment difficulty with sparser obstacles (size // 2)
Slower dynamic obstacles that move every 2 agent steps to stabilize learning
Optimized extended exploration parameters (epsilon_decay=0.999, epsilon_min=0.02)
Local obstacle sensors leading to goal-directed state representations
Reward shaping for efficient learning
Tabular Q-learning baseline
Deep Q-Network (DQN) implementation variations
Experience replay buffer and target network stabilization (dqn_agent_v2.py)
Evaluation and visualization scripts for multiple models

Problem Statement

The agent must learn to navigate a grid-based maze to reach a goal while avoiding obstacles.

Challenges in the environment include:

Random maze layouts
Moving obstacles
Partial observation
Stochastic dynamics

These conditions make the task significantly more difficult than classical gridworld environments.

Environment Design

Grid World

The environment is a 2D grid maze.

Example layout:

A . . X .
. . X . .
. . . . .
. X . . .
. . . . G

Legend:

A = Agent
G = Goal
X = Obstacle
. = Free cell

Random Maze Generation

Each episode generates a new maze configuration with random obstacle placement.

state = env.reset(seed=episode)

This forces the agent to learn general navigation strategies instead of memorizing paths.

Dynamic Obstacles

Obstacles move randomly during the episode. To make the learning curve manageable and realistic:

Obstacles are placed dynamically, equal to self.size // 2.
Obstacles move once every 2 steps (instead of every step), providing the agent with improved reaction capability and less chaotic environments.

Agent moves → (Every 2 steps: Obstacles move) → Collision check

State Representation

The agent observes a compact state representation consisting of:

(dx, dy, obstacle_up, obstacle_down, obstacle_left, obstacle_right)

Where:

dx = goal_x - agent_x
dy = goal_y - agent_y

Obstacle sensors:

obstacle_up
obstacle_down
obstacle_left
obstacle_right

Example state:

(3, -1, 0, 1, 0, 0)

Meaning:

Goal is 3 cells right, 1 cell up
Obstacle detected below

Action Space

The agent can perform four discrete actions:

0 → move up
1 → move down
2 → move left
3 → move right

Reward Function

The reward strategy combines sparse rewards with reward shaping.

Event	Reward
Move closer to goal	+1
Move away from goal	-1
Step penalty	-1
Collision with obstacle	-100
Reach goal	+100

Algorithms Implemented

1. Tabular Q-Learning (`agents/q_learning.py`)

Baseline algorithm using a dictionary-based Q-table.

Update rule:

Q(s,a) = Q(s,a) + α [ r + γ max(Q(s',a')) − Q(s,a) ]

2. Basic Deep Q-Network (`agents/dqn_agent.py`)

Replaces the Q-table with a Neural Network. State is fed directly into dual 64-layer MLPs resolving into Q-values. Evaluates directly against an online Q-value target.

3. Advanced DQN V2 (`agents/dqn_agent_v2.py`)

To overcome DQN instability limitations, this includes:

Experience Replay Buffer: Breaking temporal correlation and improving sampling capability.
Target Network: Creating stability iteratively for network evaluations.

Project Structure

rl_dynamic_maze/

env/
    maze_env.py                 # Core environment simulation

agents/
    q_learning.py               # Tabular Q-Learning baseline
    dqn_agent.py                # Vanilla DQN agent
    dqn_agent_v2.py             # Advanced DQN with Experience Replay & Target Network

# Training Scripts
train.py                        # Trains Q-Learning Agent
train_dqn.py                    # Trains DQN Agent
train_dqn.ipynb                 # Jupyter Notebook for interactive DQN training

# Evaluation & Testing Scripts
evaluate.py                     # Evaluates Q-Learning on generalized scenarios
evaluate_dqn.py                 # Evaluates DQN on generalized scenarios
test_trained_agent.py           # Watch the trained Q-Learning agent navigate (render)
test.py                         # Environment interaction mechanics test script

# Artifacts
q_table.pkl                     # Q-table weights (generated after train.py)
dqn_weights.pth                 # Neural network weights (generated after train_dqn.py)
README.md

How to Run

1. Train the Agent

Option A: Train the Q-Learning Agent

python train.py

Option B: Train the DQN Agent

python train_dqn.py

2. Evaluate Trained Policies

Evaluates the agents against unseen random maze environments (testing for generalizability):

python evaluate.py         # For Q-Learning
python evaluate_dqn.py     # For DQN

3. Watch the Agent Navigate

Render the environment dynamically and watch the trained Q-Learning agent play:

python test_trained_agent.py

4. Basic Environment Functionality Run

Just output the current grid mechanics using fixed manual actions:

python test.py

Results

We track performance using:

Success rate
Collision rate
Average episode reward

Recent modifications adjusting movement penalties and standardizing obstacle speeds have stabilized the curve drastically, leading to elevated success rates and sustained training optimizations in Q-Learning and Deep network variations.

Future Improvements

Double DQN implementations
Prioritized experience replay mapping
Dueling Deep Q-Networks
Multi-agent obstacle environments
Vision-based state representation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning for Navigation in Dynamic Maze Environments

Overview

Key Features

Problem Statement

Environment Design

Grid World

Random Maze Generation

Dynamic Obstacles

State Representation

Action Space

Reward Function

Algorithms Implemented

1. Tabular Q-Learning (`agents/q_learning.py`)

2. Basic Deep Q-Network (`agents/dqn_agent.py`)

3. Advanced DQN V2 (`agents/dqn_agent_v2.py`)

Project Structure

How to Run

1. Train the Agent

2. Evaluate Trained Policies

3. Watch the Agent Navigate

4. Basic Environment Functionality Run

Results

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents		agents
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
evaluate_dqn.py		evaluate_dqn.py
requiremnt.txt		requiremnt.txt
test.py		test.py
test_trained_agent.py		test_trained_agent.py
train.py		train.py
train_dqn.ipynb		train_dqn.ipynb
train_dqn.py		train_dqn.py
train_dqn_v2.py		train_dqn_v2.py

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Navigation in Dynamic Maze Environments

Overview

Key Features

Problem Statement

Environment Design

Grid World

Random Maze Generation

Dynamic Obstacles

State Representation

Action Space

Reward Function

Algorithms Implemented

1. Tabular Q-Learning (agents/q_learning.py)

2. Basic Deep Q-Network (agents/dqn_agent.py)

3. Advanced DQN V2 (agents/dqn_agent_v2.py)

Project Structure

How to Run

1. Train the Agent

2. Evaluate Trained Policies

3. Watch the Agent Navigate

4. Basic Environment Functionality Run

Results

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Tabular Q-Learning (`agents/q_learning.py`)

2. Basic Deep Q-Network (`agents/dqn_agent.py`)

3. Advanced DQN V2 (`agents/dqn_agent_v2.py`)

Packages