Skip to content

vnayakde/rl_dynamic_maze

Repository files navigation

Reinforcement Learning for Navigation in Dynamic Maze Environments

Overview

This project implements a reinforcement learning (RL) navigation system where an agent learns to reach a goal in randomly generated maze environments with moving obstacles. The system starts with a tabular Q-learning baseline and is later upgraded to a Deep Q-Network (DQN) with stabilization techniques such as experience replay and a target network.

The objective is to study how an RL agent can learn goal-directed navigation in stochastic and dynamic environments, which is a core problem in robotics and autonomous systems. To stabilize learning and improve navigation performance, the environment has been recently tuned with custom obstacle density and slower obstacle dynamics, alongside optimized exploration rates.


Key Features

  • Random maze generation every episode
  • Reduced environment difficulty with sparser obstacles (size // 2)
  • Slower dynamic obstacles that move every 2 agent steps to stabilize learning
  • Optimized extended exploration parameters (epsilon_decay=0.999, epsilon_min=0.02)
  • Local obstacle sensors leading to goal-directed state representations
  • Reward shaping for efficient learning
  • Tabular Q-learning baseline
  • Deep Q-Network (DQN) implementation variations
  • Experience replay buffer and target network stabilization (dqn_agent_v2.py)
  • Evaluation and visualization scripts for multiple models

Problem Statement

The agent must learn to navigate a grid-based maze to reach a goal while avoiding obstacles.

Challenges in the environment include:

  • Random maze layouts
  • Moving obstacles
  • Partial observation
  • Stochastic dynamics

These conditions make the task significantly more difficult than classical gridworld environments.


Environment Design

Grid World

The environment is a 2D grid maze.

Example layout:

A . . X .
. . X . .
. . . . .
. X . . .
. . . . G

Legend:

A = Agent
G = Goal
X = Obstacle
. = Free cell

Random Maze Generation

Each episode generates a new maze configuration with random obstacle placement.

state = env.reset(seed=episode)

This forces the agent to learn general navigation strategies instead of memorizing paths.

Dynamic Obstacles

Obstacles move randomly during the episode. To make the learning curve manageable and realistic:

  • Obstacles are placed dynamically, equal to self.size // 2.
  • Obstacles move once every 2 steps (instead of every step), providing the agent with improved reaction capability and less chaotic environments.
Agent moves → (Every 2 steps: Obstacles move) → Collision check

State Representation

The agent observes a compact state representation consisting of:

(dx, dy, obstacle_up, obstacle_down, obstacle_left, obstacle_right)

Where:

dx = goal_x - agent_x
dy = goal_y - agent_y

Obstacle sensors:

obstacle_up
obstacle_down
obstacle_left
obstacle_right

Example state:

(3, -1, 0, 1, 0, 0)

Meaning:

  • Goal is 3 cells right, 1 cell up
  • Obstacle detected below

Action Space

The agent can perform four discrete actions:

0 → move up
1 → move down
2 → move left
3 → move right

Reward Function

The reward strategy combines sparse rewards with reward shaping.

Event Reward
Move closer to goal +1
Move away from goal -1
Step penalty -1
Collision with obstacle -100
Reach goal +100

Algorithms Implemented

1. Tabular Q-Learning (agents/q_learning.py)

Baseline algorithm using a dictionary-based Q-table.

Update rule:

Q(s,a) = Q(s,a) + α [ r + γ max(Q(s',a')) − Q(s,a) ]

2. Basic Deep Q-Network (agents/dqn_agent.py)

Replaces the Q-table with a Neural Network. State is fed directly into dual 64-layer MLPs resolving into Q-values. Evaluates directly against an online Q-value target.

3. Advanced DQN V2 (agents/dqn_agent_v2.py)

To overcome DQN instability limitations, this includes:

  • Experience Replay Buffer: Breaking temporal correlation and improving sampling capability.
  • Target Network: Creating stability iteratively for network evaluations.

Project Structure

rl_dynamic_maze/

env/
    maze_env.py                 # Core environment simulation

agents/
    q_learning.py               # Tabular Q-Learning baseline
    dqn_agent.py                # Vanilla DQN agent
    dqn_agent_v2.py             # Advanced DQN with Experience Replay & Target Network

# Training Scripts
train.py                        # Trains Q-Learning Agent
train_dqn.py                    # Trains DQN Agent
train_dqn.ipynb                 # Jupyter Notebook for interactive DQN training

# Evaluation & Testing Scripts
evaluate.py                     # Evaluates Q-Learning on generalized scenarios
evaluate_dqn.py                 # Evaluates DQN on generalized scenarios
test_trained_agent.py           # Watch the trained Q-Learning agent navigate (render)
test.py                         # Environment interaction mechanics test script

# Artifacts
q_table.pkl                     # Q-table weights (generated after train.py)
dqn_weights.pth                 # Neural network weights (generated after train_dqn.py)
README.md

How to Run

1. Train the Agent

Option A: Train the Q-Learning Agent

python train.py

Option B: Train the DQN Agent

python train_dqn.py

2. Evaluate Trained Policies

Evaluates the agents against unseen random maze environments (testing for generalizability):

python evaluate.py         # For Q-Learning
python evaluate_dqn.py     # For DQN

3. Watch the Agent Navigate

Render the environment dynamically and watch the trained Q-Learning agent play:

python test_trained_agent.py

4. Basic Environment Functionality Run

Just output the current grid mechanics using fixed manual actions:

python test.py

Results

We track performance using:

  • Success rate
  • Collision rate
  • Average episode reward

Recent modifications adjusting movement penalties and standardizing obstacle speeds have stabilized the curve drastically, leading to elevated success rates and sustained training optimizations in Q-Learning and Deep network variations.


Future Improvements

  • Double DQN implementations
  • Prioritized experience replay mapping
  • Dueling Deep Q-Networks
  • Multi-agent obstacle environments
  • Vision-based state representation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors