This project implements a Reinforcement Learning agent that learns to navigate a 6ร6 grid. The agent must find the optimal path from a starting position to a goal while avoiding static obstacles.
The world is a coordinate-based grid where:
-
Grid Size: 6ร6
-
Start Position (S):
(5, 0)(Bottom-Left) -
Goal Position (G):
(0, 5)(Top-Right) -
Obstacles (X): Static blocks located at specific coordinates that penalize the agent.
The agent uses the Q-Learning algorithm to populate a 3D Q-Table (6, 6, 4), representing 4 possible actions (Up, Right, Down, Left) for every grid cell.
-
Alpha (ฮฑ): 0.3 (Learning Rate)
-
Gamma (ฮณ): 0.95 (Discount Factor)
-
Epsilon (ฯต): Starts at 0.9 and decays over time to balance exploration and exploitation.
-
Goal Reach: +100
-
Obstacle/Boundary Hit: -10
-
Each Step: -0.5 (Encourages the shortest path)
-
Initialize the Q-Table with zeros.
-
Run the
train_agent()function for 20,000 episodes. -
Use
visualize_best_grid(q_table)to view the learned policy in the console.
Below is the visual representation of the agent's optimal policy after training. Arrows indicate the action with the highest Q-value for each state.
-
Epsilon-Greedy Policy: Ensures the agent explores the grid thoroughly before settling on a path.
-
Boundary Protection: The
is_valid_statefunction prevents the agent from leaving the 6ร6 area. -
Detailed Visualization: Formatted console output using
:7.2ffor aligned and readable Q-values.
