Snake Agent Player - Genetic Algorithm Training

Itay Shaul, Lior Lotan, Ben Kapon, Ori Cohen

Introduction

This project trains AI agents to play the classic Snake game using Genetic Algorithms (GA) combined with Neural Networks. The goal is to evolve neural network weights through a process inspired by natural selection, where the fittest agents (those that eat the most apples and survive longest) pass their "genes" (network weights) to the next generation.

The Snake game presents an interesting challenge for AI: the agent must learn to navigate toward food while avoiding collisions with walls, its own body, and (in advanced modes) obstacles. Unlike supervised learning where we provide correct answers, the genetic algorithm discovers effective strategies through evolutionary pressure alone.

Our project explores four key dimensions that affect learning performance:

Population size and generation count and their impact on convergence speed
Environmental complexity (obstacle modes) and transfer learning
Fitness function design and agent behaviors
Training Diversity and generalization

The Game

Classic Snake Mechanics

The Snake game is played on a grid where the player controls a snake that moves in one of four directions (UP, RIGHT, DOWN, LEFT). The objective is to eat apples that appear randomly on the grid. Each apple eaten causes the snake to grow longer. The game ends when the snake collides with:

The wall (boundary of the grid)
Its own body
An obstacle (in twist mode)

Game Modes

We implement two primary modes:

Mode	Description
Baseline	Classic Snake without obstacles - the control condition for experiments
Twist	Snake with obstacles that add environmental complexity

The twist mode supports three obstacle policies:

Obstacle Mode	Description	Visualization
Static	3 fixed obstacles placed at the start of each episode
Rotating	A rolling window of 3 obstacles (oldest removed when new spawns)
Aggregating	Obstacles accumulate dynamically (1 per apple eaten)

Training Progress Visualization

The following GIFs demonstrate the evolution of agent performance across training generations:

Generation 1 - Initial Random Behavior (0 apples):

The untrained agent moves randomly with no learned behavior, quickly colliding with a wall after just 12 steps.

Generation 50 - Intermediate Learning (10 apples):

After 50 generations of evolution, the agent has learned basic food-seeking behavior. However, it gets stuck in loops, circling without making progress toward food.

Generation 250 - Final Trained Agent (141 apples - Victory!):

The fully trained agent demonstrates sophisticated navigation, efficiently collecting apples while avoiding collisions. In this run, the agent achieved victory by eating all apples and filling the entire board!

Genetic Algorithm (GA)

Overview

The Genetic Algorithm is a method for solving optimization problems based on natural selection, the process that drives biological evolution. The core idea is that "the strong survive" - by maintaining a population of candidate solutions and iteratively selecting the best performers, we can evolve increasingly effective solutions over generations.

Algorithm Outline

flowchart TD
    A[Initialize Random Population] --> B[Evaluate Fitness]
    B --> C[Select Elite Individuals]
    C --> D[Tournament Selection]
    D --> E[Crossover - Combine Parents]
    E --> F[Mutation - Add Variation]
    F --> G{Max Generations?}
    G -->|No| B
    G -->|Yes| H[Return Best Genome]

Key Components

Genome Representation: Each agent's neural network weights are encoded as a flat array (genome). The network has:
- Input layer: 11 features (danger sensors, heading, food direction)
- Hidden layer: Configurable size (default 16 neurons)
- Output layer: 3 actions (turn left, go straight, turn right)
Fitness Evaluation: Each genome is tested over multiple episodes. Fitness is computed based on:
- Apples eaten (primary objective)
- Steps survived (secondary objective)
- Custom weighting allows different agent behaviors
Selection: Tournament selection chooses parents - k random individuals compete, and the fittest wins the right to reproduce.
Crossover: Uniform crossover combines two parent genomes - each gene is randomly taken from either parent.
Mutation: Gaussian mutation adds random noise to genes with a configurable probability and magnitude.
Elitism: The top-performing individuals are preserved unchanged to prevent losing good solutions.

Code Overview

Project Structure

SnakeAgentPlayer/
├── env/                        # Snake game environment
│   ├── snake_env.py            # Core Snake environment with O(1) collision detection
│   ├── obstacles.py            # Obstacle spawning policies (Static, Rotating, Aggregating)
│   └── rewards.py              # Reward shaping functions
│
├── agent/                      # Neural network agent
│   ├── encoder.py              # State encoder - converts game state to feature vector (11-dim)
│   ├── nn_policy.py            # Neural network policy backed by genome weights
│   └── policy.py               # Policy interface and baseline policies (Random, Greedy)
│
├── ga/                         # Genetic algorithm
│   ├── config.py               # GA hyperparameters configuration
│   ├── evaluator.py            # Fitness evaluators (Collector, Survivor, Hungry strategies)
│   ├── operators.py            # Genetic operators (selection, crossover, mutation)
│   └── ga.py                   # Main GA class orchestrating the evolution loop
│
├── rendering/                  # Visualization
│   └── visual_renderer.py      # Pygame-based renderer with modern graphics
│
├── scripts/                    # Training and testing utilities
│   ├── train.py                # Training entry point
│   ├── play.py                 # Agent playback tool
│   ├── automated_training.py   # Full training pipeline with checkpointing
│   ├── test_genome.py          # Genome evaluation and visual playback
│
├── experiments/                # Experiment utilities
│   ├── config_loader.py        # YAML configuration loading
│   ├── logging.py              # Training progress logging
│   └── plotting.py             # Results visualization
│
├── images/                     # Experiment result images and GIFs
├── setup.py                    # Package installation configuration
├── Makefile                    # Command shortcuts
└── requirements.txt            # Python dependencies

Important Functions

Genetic Algorithm Functions:

GeneticAlgorithm.run(): Execute the evolutionary loop and return the best genome found.
tournament_select(): Select a parent genome via tournament selection.
uniform_crossover(): Create offspring genomes by mixing two parents (uniform crossover).
gaussian_mutate(): Mutate a genome by adding Gaussian noise to individual genes.

Evaluation and Fitness Functions:

SnakeEvaluator.evaluate_genome(): Run a genome for multiple episodes and compute its fitness plus performance metrics.
BalancedFitnessEvaluator: Fitness variant that emphasizes survival (used for the “Survivor” strategy).
HungryFitnessEvaluator: Fitness variant that emphasizes step efficiency (used for the “Hungry” strategy).
TwistModeEvaluator: Twist-mode evaluator that averages performance across more episodes for obstacle environments.

Environment Functions:

SnakeEnv.reset(): Reset the environment to a new episode (optionally with a deterministic seed).
SnakeEnv.step(): Advance the game by one action and return (state, reward, done, info).
AggregatingObstaclePolicy.on_food_eaten(): Add new obstacles after each apple (obstacles accumulate).
RotatingObstaclePolicy.on_food_eaten(): Add new obstacles while keeping a maximum number (oldest rotates out).
StaticObstaclePolicy.on_reset(): Spawn a fixed set of obstacles at the start of an episode.
DefaultRewardFn.reward(): Default reward shaping (+apple reward, optional step reward, death penalty).

Agent and Policy Functions:

Encoder.encode(): Convert an EnvState into an 11-dimensional feature vector for the neural network.
NeuralNetPolicy.act(): Choose an action (left/straight/right) from encoded features using a feedforward network.
genome_size(): Compute the parameter count required for a given NetworkArch.
random_genome(): Initialize a random genome with scaled (Xavier-like) weights.
unpack_genome(): Unpack a flat genome into (W1, b1, W2, b2) tensors.

Rendering and Visualization Functions:

SnakeRenderer.render(): Render the current game state with pygame.
test_genome(): Load a genome + config and run evaluation episodes (optionally with visualization).

Training and CLI Entry Points:

AutomatedTrainingRunner.run(): End-to-end training pipeline with checkpointing, metrics, and best-genome saving.
scripts/train.py:main(): CLI wrapper that runs AutomatedTrainingRunner for a given YAML config.
scripts/play.py:main(): CLI wrapper that loads a saved genome and calls test_genome() for playback.

Data Files:

configs/*.yaml: Training configurations (environment + GA hyperparameters).
runs/<run_name>/*.npy: Saved genomes (checkpoints like gen_<N>.npy and best_genome.npy).
runs/<run_name>/metrics.csv: Per-generation training metrics.
runs/<run_name>/checkpoints_metadata.json: Metadata for saved checkpoints (fitness, apples, steps, death reasons).
runs/<run_name>/config_used.yaml: Exact configuration snapshot used for that run.

How to Run

Prerequisites

Python 3.11 or 3.12 (required)
Python 3.13+ may have compatibility issues with pygame on Windows

Installation

Mac/Linux:

git clone <repository-url>
cd SnakeAgentPlayer
python3 -m venv venv
source venv/bin/activate
make install

Windows:

git clone <repository-url>
cd SnakeAgentPlayer
python -m venv venv
venv\Scripts\activate
make install

After installation, run make help to see all available commands.

Training

make train CONFIG=configs/winner.yaml

The training will output:

runs/<experiment_name>/best_genome.npy - Trained agent weights
runs/<experiment_name>/config_used.yaml - Configuration used
runs/<experiment_name>/metrics.csv - Training statistics
runs/<experiment_name>/training_progress.png - Performance plot

Watching Trained Agents

# Watch the latest trained agent
make play

# Watch a specific experiment
make play RUN=winner

# Customize playback
make play RUN=winner EPISODES=10 FPS=20

Advanced CLI Options

Command	Description
`make help`	Show all available commands and options
`make install`	Install package and dependencies
`make train CONFIG=<file>`	Train with specific config file
`make play [OPTIONS]`	Watch trained agent
`make clean`	Remove training runs and cache

Play Options (all optional, smart defaults):

Option	Description
`RUN=<name>`	Experiment name (default: latest run)
`GENOME=<file>`	Genome to play (default: best_genome.npy from RUN)
`CONFIG=<file>`	Environment config (default: config_used.yaml from RUN)
`TWIST=<mode>`	Twist mode (overrides config if set)
	`aggregating` - Obstacles accumulate over time
	`rotating` - Max 3 obstacles, oldest rotates out
	`static` - 3 random fixed obstacles
`EPISODES=<n>`	Number of episodes (default: 3)
`FPS=<n>`	Playback speed (default: 10)
`VISUAL=0`	Disable visualization for faster testing (default: enabled)

Run make help to see usage examples.

Configuration Structure

env:
  width: 12                  # Grid width (cells)
  height: 12                 # Grid height (cells)
  twist: false               # Enable twist mode with obstacles (boolean)
  max_steps: 2000            # Optional: max steps per episode (timeout)
  starvation_steps: 200      # Optional: steps without food before starvation

  # Used only when twist: true (defaults shown below)
  obstacle_policy:
    mode: aggregating        # aggregating | rotating | static
    min_distance_from_head: 3
    obstacles_per_food: 1    # aggregating-only
    max_obstacles: 3         # rotating-only
    num_obstacles: 3         # static-only

agent:
  network:
    hidden_dim: 16           # Neural network hidden layer size

ga:
  pop_size: 50              # Population size
  generations: 200          # Number of generations
  episodes_per_genome: 5    # Episodes to evaluate each genome (averaged)
  seed: 123                 # Random seed (reproducibility)

  # Fitness function type (optional; defaults to "collector")
  # Supported: collector (aka greedy), survivor (aka balanced), hungry
  fitness_type: collector

  # GA operator hyperparameters (all optional; defaults shown below)
  elite_frac: 0.05
  tournament_k: 5
  crossover_rate: 0.7
  mutation_rate: 0.1
  mutation_sigma: 0.2

logging:
  output_dir: runs           # Where training runs are saved

Framing the Problem

Our research investigates how different parameters and design choices affect the learning performance of genetic algorithms in the Snake game domain. We frame this as four distinct experimental questions:

Population Size and Generation Count: What is the optimal combination of population size and number of generations for balancing exploration, convergence speed, and final performance?
Environmental Complexity and Transfer Learning: How does training environment complexity (obstacle modes) affect robustness, seed sensitivity, and transfer learning across different configurations?
Reward Shaping and Emergent Behavior: How do different fitness functions shape the behavior and performance of evolved Snake agents?
Overfitting vs. Generalization: How does the diversity of training environments (number of training seeds) affect the agent's ability to generalize to unseen environments?

Each experiment is designed with:

Independent variable: The parameter being tested
Dependent variable: Performance metrics (apples eaten, survival time)
Controlled variables: All other parameters held constant

Experiments

Experiment 1: Population Size and Generation Count Impact

Research Question

What is the optimal balance between population size and generation count for maximizing learning speed and final performance?

We hypothesize that:

Larger populations will learn faster due to increased genetic diversity
More generations allow for continued improvement, but with diminishing returns
There exists an optimal combination where adding more generations or population provides no benefit
Smaller populations might need more generations to converge

Experiment Design

Variable	Values
Independent	Population size: 20, 50, 100, 150
Independent	Generation count: 50, 150, 250, 325
Dependent	Mean apples eaten by best agent
Controlled	All other GA parameters (seed, mutation rate, etc.)

We test all combinations of population sizes and generation counts to find the optimal balance.

Total: 16 experiments

Note: Each configuration was tested across multiple random seeds, and results were averaged to ensure statistical reliability and reduce variance from lucky/unlucky initial conditions.

Results

Population 20

Experiment	Final Apples	Final Fitness
pop20_gen50	3.3	347
pop20_gen150	26.7	2,745
pop20_gen250	38.3	3,960
pop20_gen325	56.0	5,754

Observation: Learning is slow. Still improving at 325 generations - no plateau reached.

Population 50

Experiment	Final Apples	Final Fitness
pop50_gen50	1.7	175
pop50_gen150	27.7	2,851
pop50_gen250	56.7	5,830
pop50_gen325	66.3	6,822

Observation: Performs worse than Pop20 at gen50, but catches up later. Still no plateau at 325 generations.

Population 100

Experiment	Final Apples	Final Fitness
pop100_gen50	31	3,206
pop100_gen150	74	7,665
pop100_gen250	82	8,478
pop100_gen325	82	8,478

Observation: Learns dramatically faster. Plateaus at ~82 apples around generation 250.

Population 150

Experiment	Final Apples	Final Fitness
pop150_gen50	28.3	2,929
pop150_gen150	29.7	3,041
pop150_gen250	37.0	3,807
pop150_gen325	39.3	4,042

Observation: SURPRISING! Pop150 has a big early spike at Gen 11 (~27 apples) but then improves very slowly. Even at 325 generations it only reaches 39 apples - way worse than Pop100's 82 apples!

Comparison Summary

Generations	Pop20	Pop50	Pop100	Pop150	Winner
50	3.3	1.7	31	28.3	Pop100
150	26.7	27.7	74	29.7	Pop100
250	38.3	56.7	82	37.0	Pop100
325	56.0	66.3	82	39.3	Pop100

Conclusions

Pop100 is the sweet spot for population: Neither too small (slow learning) nor too large (premature convergence).
250 generations is sufficient for Pop100: Performance plateaus at ~82 apples around generation 250 - additional generations (325) provide no improvement.
Too large population can be bad: Pop150 improves very slowly after an early spike, reaching only 39 apples at 325 generations - way worse than Pop100's 82 apples. This demonstrates a well-known GA phenomenon called "loss of selection pressure" - with too many individuals, each genome has less relative impact, weakening the "survival of the fittest" mechanism. The population drifts rather than climbs toward optimal solutions.
Small populations need more generations: Pop20 and Pop50 keep improving at 325 generations but never catch up to Pop100, suggesting they would need significantly more generations to converge.
Diminishing returns on both axes: Beyond the optimal point, adding more population or more generations wastes computational resources without improving performance.

Bottom line: The optimal configuration is Population 100 with ~250 generations - this achieves peak performance (82 apples) with minimal computational cost.

Experiment 2: Transfer Learning Across Environmental Complexity

Research Question

Does training in more complex environments (with obstacles) help agents perform better when tested in simpler environments, and vice versa?

We trained agents in four different environment types - from classic Snake (no obstacles) to increasingly challenging obstacle modes. Then we tested each trained agent across ALL environment types to measure how well skills transfer between different conditions.

We hypothesize that:

Asymmetric Transfer: Agents trained with obstacles will adapt better to obstacle-free environments than the reverse (complex to simple transfers better than simple to complex).
Dynamic vs Static: Moving obstacles (Rotating, Aggregating) will produce more robust agents than fixed obstacles, since agents must learn general avoidance rather than memorizing specific positions.
Seed Sensitivity: Static obstacle positions may cause high variance between training runs, as some random seeds create easier/harder configurations.

Training Environments

Mode	Description
Baseline Control	Classic Snake without obstacles - control condition
Static	3 fixed obstacles placed at the start of each episode
Rotating	A rolling window of 3 obstacles (oldest removed when new spawns)
Aggregating	Obstacles accumulate dynamically (1 per apple eaten)

See The Game section for visualizations of each mode.

Experiment Design

Variable	Values
Independent	Training environment: Baseline, Static, Rotating, Aggregating
Dependent	Mean apples eaten, transfer performance across environments
Controlled	Population size (100), generations (250), network architecture

Training Protocol:

For each of the 4 environment modes, we trained 12 separate agents using different random seeds. This ensures our results reflect the true capability of each training mode rather than lucky/unlucky seed effects. Results shown are averages across all 12 runs per mode.

Parameter	Value
Training Runs	12 independent seeds per environment mode (48 total)
Population Size	100 genomes per generation
Training Duration	250 generations per experiment
Evaluation	5 episodes per genome during training
Environment	12×12 grid
Fitness Function	Balanced: `Fitness = (Apples × 50) + (Steps × 1.0)`

Testing Protocol:

Each of the 48 trained agents was tested on ALL 4 environment modes (including modes different from training), creating a 4×4 transfer matrix. Total: 38,400 test episodes across 192 train/test combinations.

Results

Aggregated Performance Matrix

Mean ± Standard Deviation across 12 experiments (Performance measured in apples per episode):

Training Mode	Baseline Control	Static	Rotating	Aggregating	Avg. Performance	Seed CV (%)
Baseline Control	37.1 ±7.4	4.4 ±1.3	5.1 ±0.9	4.4 ±0.5	12.75	16.2%
Static	21.5 ±10.8	4.2 ±1.9	4.9 ±2.2	4.1 ±1.8	8.67	46.1%
Rotating	22.9 ±4.9	5.0 ±1.6	5.9 ±1.2	4.7 ±0.8	9.62	19.2%
Aggregating	16.7 ±7.4	4.5 ±2.2	5.0 ±1.6	4.2 ±1.0	7.63	37.8%

CV (Coefficient of Variation) measures training stability across seeds. Lower is more reliable.

Death Reason Analysis

Percentage of death causes when agents play in their trained environment:

Training Mode	Body Collision	Wall Collision	Obstacle Collision	Starvation	Timeout
Baseline Control	73.8%	12.0%	0.0%	2.0%	12.2%
Static	12.1%	9.5%	13.2%	65.2%	0.0%
Rotating	14.8%	7.5%	14.9%	62.9%	0.0%
Aggregating	16.4%	9.7%	16.5%	57.4%	0.0%

Baseline agents fail by self-collision (greedy behavior), obstacle-trained agents fail by starvation (cautious behavior).

Transfer Learning Analysis

Positive Transfer (Complexity → Simplicity): Agents trained on any obstacle mode retained 45–62% of baseline performance when tested on clean maps.
Negative Transfer (Simplicity → Complexity): Baseline agents suffered catastrophic failure, retaining only 12–14% performance when obstacles introduced.

Key Observations

Diagonal Dominance: Baseline Control dominates its own environment (37.1 apples) but fails catastrophically on any obstacle mode (~4.4 apples, 88% drop).
Cross-Obstacle Transfer is Weak: All obstacle-trained agents perform similarly poorly (~4-6 apples) on modes different from their training.
Static Mode Instability: CV of 46.1% indicates extreme seed variance-some seeds produce near-failure, others succeed.
Rotating Mode Reliability: Best stability among obstacle modes (19.2% CV) with consistent mid-tier generalization.
Death Pattern Divergence: Baseline agents fail by self-collision (73.8%) indicating greedy behavior. Obstacle-trained agents fail by starvation (57-65%) indicating overly cautious navigation.

Conclusions

Static Obstacles Create Deceptive Fitness Landscapes: Static mode's extreme instability (46.1% CV) occurs because fixed obstacle positions allow the GA to discover "lucky" paths that work for specific coordinates rather than learning obstacle avoidance as a skill.
Dynamic Environments Force True Learning: Rotating obstacles achieved the most reliable training (19.2% CV) because the environment changes within each episode. Agents must develop functional understanding of spatial danger rather than memorizing paths.
Asymmetric Transfer Reveals Strategy Differences: The 45–62% positive transfer (Obstacle → Baseline) versus 12–14% negative transfer (Baseline → Obstacle) demonstrates that obstacle training develops genuine spatial reasoning, while baseline training produces pure optimization shortcuts.
Cross-Obstacle Transfer Weakness Indicates Mode-Specific Adaptation: Agents don't learn universal "avoid objects" behavior. They develop mode-specific heuristics: Static agents navigate fixed patterns, Rotating agents time movements between spawns, Aggregating agents learn escalating caution.

Experiment 3: Fitness Function Strategies

Research Question

How do different fitness functions shape the behavior and performance of evolved Snake agents?

Fitness Strategies:

Strategy	Formula	Goal
Collector	`apples × 100 + steps × 0.1`	Maximize apple collection
Survivor	`apples × 50 + steps × 1.0`	Balance survival and apples
Hungry	`apples × 100 - steps × 2.0`	Maximize eating efficiency

We hypothesize that:

High Apple Reward Strategy (Collector) will maximize total apples eaten while disregarding survival time and efficiency metrics
Balanced Reward Strategy (Survivor) will create cautious, long-surviving agents that balance apple collection with extended survival time
Time Penalty Strategy (Hungry) will create aggressive, efficiency-focused agents that minimize steps per apple

Experiment Design

Variable	Values
Independent	Fitness function: Collector, Survivor, Hungry
Dependent	Apples eaten, steps survived, efficiency (steps/apple)
Controlled	Population size (150), generations (200), network architecture

Training Protocol:

To ensure statistically robust results, we structured our experiment as follows:

Three training seed groups: We selected 3 different random seeds (1264, 4242, 7777) to create different training conditions
Three fitness types per seed: For each training seed, we trained 3 separate agents - one for each fitness strategy (Collector, Survivor, Hungry)
Nine total configurations: This gave us 9 training runs total (3 seeds × 3 fitness types)

Evaluation Protocol:

After training, each of the 9 trained agents was evaluated on 200 unseen test episodes (using test seeds 10000-10199) to measure real-world performance. We then averaged results across the 3 runs for each fitness type to obtain the final performance metrics.

Results

Performance Summary:

Strategy	Apples Collected	Steps Survived	Steps/Apple
Collector	47.96 ± 17.20	1643.8 ± 588.0	34.3
Survivor	46.65 ± 30.80	1752.8 ± 1095.5	37.6
Hungry	35.66 ± 11.35	921.3 ± 294.0	25.8

Conclusions

Fitness Engineering Works: Each strategy successfully optimized for its target metric - Collector maximized apples (47.96), Survivor maximized survival time (1752.8 steps), Hungry maximized efficiency (25.8 steps/apple).
No Free Lunch: Optimizing for one metric comes at a cost. Hungry's efficiency (25.8 steps/apple) sacrifices total performance (35.66 apples vs 47.96).
Time Penalties Create Aggression AND Consistency: The harsh penalty (×2.0) in Hungry strategy created agents that eat fast (25.8 steps/apple) and die young (921 steps), but surprisingly are the MOST consistent (±11.35 apples variance).
Surprising Result - Balanced Rewards ≠ Consistency: We hypothesized balanced rewards would create stable agents, but Survivor shows the HIGHEST game variance (±30.80 apples). The harsh time penalty in Hungry actually created the most consistent behavior.
Multi-Objective Trade-offs: No strategy wins all metrics. Choose based on your objective.

Experiment 4: Impact of Training Diversity on Generalization

Research Question

How does the diversity of training environments (number of training seeds) affect the agent's ability to generalize to unseen environments?

Hypothesis

We hypothesize that:

Low Diversity will result in Overfitting: The agent will achieve high scores on the training seed but will achieve lower scores on new seeds.
High Diversity will result in Generalization: The agent will perform consistently well on both training and unseen test seeds.

Experiment Design

Variable	Values
Independent	Number of training seeds: 1, 5, 15
Dependent	Train fitness, test fitness (generalization)
Controlled	Population size (80), generations (120), network architecture

Procedure: We trained 3 separate agents for each category (1, 5, and 15 seeds). The results below represent the average performance across these runs.

Train Fitness: Evaluated on the specific seeds used during training.
Test Fitness: Evaluated on 100 new random seeds.

Results

Training Diversity	Train Fitness	Test Fitness
1 Seed	37.3	18.2
5 Seeds	29.5	23.5
15 Seeds	28.4	26.9

Conclusions

Results Match Expectations: The data confirms our hypothesis. There is a clear trade-off between peak performance on known environments and stability on unseen environments.
Overfitting: The agent trained on a single seed exhibited severe overfitting. It achieved high performance on its training environment (37.3) but failed to transfer this performance to new environments, resulting in significantly lower test fitness (18.2).
Robustness and Convergence: Training on a larger set of seeds (15) encouraged the emergence of environment-agnostic behaviors rather than seed-specific strategies. Consequently, the generalization gap largely disappeared, with train and test performance converging (28.4 vs. 26.9), indicating a more stable and robust policy.

Conclusion

This project demonstrates that Genetic Algorithms can effectively evolve Neural Network controllers for Snake without labeled data, producing agents that learn food-seeking behavior and survival through selection pressure alone.

Across experiments, we show that training outcomes are strongly shaped by core design choices: compute budget (population size and generations), environment difficulty (obstacle modes), and whether training transfers to more complex settings.

Fitness engineering meaningfully changes behavior: reward structures can push agents toward maximizing apples, surviving longer, or improving efficiency, but the results highlight consistent trade-offs rather than a single universally best objective.

Finally, training diversity is critical for real-world reliability: single-seed training can overfit and fail to generalize, while increasing the number of training seeds produces more robust, environment-agnostic policies on unseen episodes.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
agent		agent
configs		configs
env		env
experiments		experiments
ga		ga
images		images
rendering		rendering
runs/winner		runs/winner
scripts		scripts
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

LiorLotan2/SnakeAgentPlayer

Folders and files

Latest commit

History

Repository files navigation

Snake Agent Player - Genetic Algorithm Training

Table of Contents

Introduction

The Game

Classic Snake Mechanics

Game Modes

Training Progress Visualization

Genetic Algorithm (GA)

Overview

Algorithm Outline

Key Components

Code Overview

Project Structure

Important Functions

How to Run

Prerequisites

Installation

Training

Watching Trained Agents

Advanced CLI Options

Configuration Structure

Framing the Problem

Experiments

Experiment 1: Population Size and Generation Count Impact

Research Question

Experiment Design

Results

Population 20

Population 50

Population 100

Population 150

Comparison Summary

Conclusions

Experiment 2: Transfer Learning Across Environmental Complexity

Research Question

Training Environments

Experiment Design

Results

Aggregated Performance Matrix

Death Reason Analysis

Transfer Learning Analysis

Key Observations

Conclusions

Experiment 3: Fitness Function Strategies

Research Question

Experiment Design

Results

Conclusions

Experiment 4: Impact of Training Diversity on Generalization

Research Question

Hypothesis

Experiment Design

Results

Conclusions

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages