Conversation
Implement fitness metrics, evolution loop, and parameter tuning infrastructure so the agent can automatically improve its navigation strategy through headless evaluation runs. - Add compute_fitness() to PokemonAgent returning structured metrics - Add --output-json CLI flag for programmatic fitness collection - Make Navigator thresholds configurable (stuck_threshold, skip_distance) - Read EVOLVE_PARAMS from environment to override navigator defaults - Add observe_session_inline() for programmatic observation access - Create evolve.py: evolution harness with LLM variant proposal, subprocess isolation, composite scoring, and observer integration - Create run_10_agents.py: parallel multi-agent evaluation runner - 367 tests, 100% coverage maintained Closes #8
Agent was stuck at (7,5) in Oak's Lab after picking Charmander. Added phased lab exit: A-mash dialogue (30 turns), move left to center column, then south to door with NPC interaction. All 10 parallel agents now beat the rival (battles_won=1, party_size=1). Winner: dc2 (door_cooldown=2, score=-65.0)
bdougie
added a commit
that referenced
this pull request
Mar 10, 2026
The backtrack guard checked `map_id == 40 AND party_count == 0`, but party_count changes to 1 the moment the agent picks up Charmander. This allowed backtracking to fire immediately after the pickup, wiping out progress. Change guard to `map_id == 40` (entire lab is protected). Also revert Oak trigger to PR #10's proven brute-force approach (4 rounds of mash_a + wait) instead of script-state-aware gating that read 0xD5F1 while still on Pallet Town map where the address is meaningless. ROM test confirms: agent picks Charmander, wins rival battle, exits lab.
4 tasks
bdougie
added a commit
that referenced
this pull request
Mar 10, 2026
* Add FLE-style backtracking with AlphaEvolve integration BacktrackManager saves/restores game state via PyBoy save_state/load_state to escape stuck navigation on Route 1. Snapshots on map change and periodically; restores when stuck_turns exceeds threshold. Four new evolvable params (bt_max_snapshots, bt_restore_threshold, bt_max_attempts, bt_snapshot_interval) flow through evolve.py and run_10_agents.py with two new variants: aggressive_bt and no_bt. * Remove unused field import and deduplicate score() - Remove unused `field` import from dataclasses in agent.py - Import `score()` from evolve.py in run_10_agents.py instead of duplicating it * Fix backtrack restore: reset script-gate flags, skip duplicate snapshots - Reset _oak_wait_done, _pallet_diag_done, _house_diag_done, _lab_phase, _lab_turns, _lab_exit_turns on backtrack restore so one-time game sequences (Oak encounter, lab phases) can re-trigger after restore - Skip periodic snapshots when position matches the last snapshot to avoid poisoning the pool with stuck-adjacent positions * Fix backtrack guard in Oak's Lab to prevent undoing Charmander pickup The backtrack guard checked `map_id == 40 AND party_count == 0`, but party_count changes to 1 the moment the agent picks up Charmander. This allowed backtracking to fire immediately after the pickup, wiping out progress. Change guard to `map_id == 40` (entire lab is protected). Also revert Oak trigger to PR #10's proven brute-force approach (4 rounds of mash_a + wait) instead of script-state-aware gating that read 0xD5F1 while still on Pallet Town map where the address is meaningless. ROM test confirms: agent picks Charmander, wins rival battle, exits lab. * Add FLE backtracking section to README with paper reference Documents the Factorio Learning Environment-inspired backtracking system: snapshot/restore mechanics, evolvable parameters, and Oak's Lab guard. Adds FLE paper to references list.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
10-Agent Results
All agents:
battles_won=1,maps_visited=4,party_size=1,final_map=0(Pallet Town after lab exit)Winner: dc2 (door_cooldown=2) — shortest door cooldown yields fastest completion.
Test plan
uv run pytest tests/— all tests pass, 100% coverage maintainedbattles_won=1)