Grid World!

We all remember Wumpus World from the first AI book we read!

Here is GridWorld.

gridworld.py — DQN navigation in a simple 10×10 gridworld.

This script trains a Deep Q-Network (DQN) agent to navigate from a start square ("S") to a goal ("G") while avoiding walls ("#"), traps ("T"), and a static adversary ("A").

Environment

Grid encoding (characters in `world`):

- " " : empty (step cost -1)
- "G" : goal (terminal, +100)
- "T" : trap (non-terminal penalty -25)
- "A" : adversary (terminal, -100)
- "#" : wall (impassable; actions that would enter are illegal)

Episode termination:

- Reaching the goal.
- Colliding with the adversary.
- Exceeding `max_steps` (default: width*height); returns reward 0 and ends.

State Representation (current code)

The agent encodes state as a 43-dimensional float vector:

Local 3×3 observation patch around the agent (9 cells): Each cell becomes 4 binary bits (A, T, G, #) => 9 * 4 = 36 dims
Relative position to the start (row_delta_norm, col_delta_norm) => 2 dims Normalised by grid height/width so values are typically in [-1, 1] for small maps.
One-hot previous action over the current Action enum (N/E/S/W/WAIT) => 5 dims

Total: 36 + 2 + 5 = 43.

Action Masking

Illegal moves (into walls) are handled in two places: 1) During action selection: only Q-values for currently available actions are considered. 2) During bootstrap target computation: next-state Q-values are masked before max().

Training Setup

Online network: MLP(43 → 64 → 64 → |Action|)
Target network: periodically synced with online network (every 100 environment steps)
Replay buffer: deque of transitions, uniform random mini-batch sampling
Loss: mean-squared TD error (DQN / Bellman regression)

Notes / Known Simplifications

The target update schedule is step-based rather than episode-based.
Traps are non-terminal (-25) in this implementation

Run

Executing python.py this file will train for 5000 episodes and save: - rewards.png - loss.png

To do

Make the adversary non-static, possibly with its own DQN policy net
Test on generalisation (new maps)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
10x10		10x10
model		model
README.md		README.md
gridworld.py		gridworld.py
loss.png		loss.png
requirements.txt		requirements.txt
rewards.png		rewards.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grid World!

Environment

Grid encoding (characters in `world`):

Episode termination:

State Representation (current code)

Action Masking

Training Setup

Notes / Known Simplifications

Run

To do

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Grid World!

Environment

Grid encoding (characters in world):

Episode termination:

State Representation (current code)

Action Masking

Training Setup

Notes / Known Simplifications

Run

To do

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Grid encoding (characters in `world`):

Packages