Get Charmander + beat rival by bdougie · Pull Request #10 · papercomputeco/pokemon

bdougie · 2026-03-10T05:27:40Z

Summary

Fix Oak's Lab exit navigation so agents advance past Pokemon selection through the rival battle
Phased approach: A-mash dialogue (30 turns), move left to center column, walk south to door with NPC interaction
All 10 parallel agents beat the rival and exit the lab

10-Agent Results

Rank	Label	Score	Party	Turns	Time
1	dc2	-65.0	1	650	3.5s
2	baseline_4dc	-72.3	1	723	3.7s
3	low_stuck_dc4	-73.4	1	734	3.5s
4	narrow_dc4	-74.1	1	741	3.5s
5	aggressive	-76.6	1	766	3.6s
6	x_axis_dc4	-79.2	1	792	3.7s
7	wide_skip_dc4	-80.5	1	805	3.8s
8	moderate	-84.3	1	843	3.9s
9	high_stuck_dc4	-85.1	1	851	4.0s
10	original	-91.7	1	917	4.2s

All agents: battles_won=1, maps_visited=4, party_size=1, final_map=0 (Pallet Town after lab exit)

Winner: dc2 (door_cooldown=2) — shortest door cooldown yields fastest completion.

Test plan

uv run pytest tests/ — all tests pass, 100% coverage maintained
10-agent parallel run completes successfully
All agents beat rival battle (battles_won=1)

Implement fitness metrics, evolution loop, and parameter tuning infrastructure so the agent can automatically improve its navigation strategy through headless evaluation runs. - Add compute_fitness() to PokemonAgent returning structured metrics - Add --output-json CLI flag for programmatic fitness collection - Make Navigator thresholds configurable (stuck_threshold, skip_distance) - Read EVOLVE_PARAMS from environment to override navigator defaults - Add observe_session_inline() for programmatic observation access - Create evolve.py: evolution harness with LLM variant proposal, subprocess isolation, composite scoring, and observer integration - Create run_10_agents.py: parallel multi-agent evaluation runner - 367 tests, 100% coverage maintained Closes #8

Agent was stuck at (7,5) in Oak's Lab after picking Charmander. Added phased lab exit: A-mash dialogue (30 turns), move left to center column, then south to door with NPC interaction. All 10 parallel agents now beat the rival (battles_won=1, party_size=1). Winner: dc2 (door_cooldown=2, score=-65.0)

The backtrack guard checked `map_id == 40 AND party_count == 0`, but party_count changes to 1 the moment the agent picks up Charmander. This allowed backtracking to fire immediately after the pickup, wiping out progress. Change guard to `map_id == 40` (entire lab is protected). Also revert Oak trigger to PR #10's proven brute-force approach (4 rounds of mash_a + wait) instead of script-state-aware gating that read 0xD5F1 while still on Pallet Town map where the address is meaningless. ROM test confirms: agent picks Charmander, wins rival battle, exits lab.

* Add FLE-style backtracking with AlphaEvolve integration BacktrackManager saves/restores game state via PyBoy save_state/load_state to escape stuck navigation on Route 1. Snapshots on map change and periodically; restores when stuck_turns exceeds threshold. Four new evolvable params (bt_max_snapshots, bt_restore_threshold, bt_max_attempts, bt_snapshot_interval) flow through evolve.py and run_10_agents.py with two new variants: aggressive_bt and no_bt. * Remove unused field import and deduplicate score() - Remove unused `field` import from dataclasses in agent.py - Import `score()` from evolve.py in run_10_agents.py instead of duplicating it * Fix backtrack restore: reset script-gate flags, skip duplicate snapshots - Reset _oak_wait_done, _pallet_diag_done, _house_diag_done, _lab_phase, _lab_turns, _lab_exit_turns on backtrack restore so one-time game sequences (Oak encounter, lab phases) can re-trigger after restore - Skip periodic snapshots when position matches the last snapshot to avoid poisoning the pool with stuck-adjacent positions * Fix backtrack guard in Oak's Lab to prevent undoing Charmander pickup The backtrack guard checked `map_id == 40 AND party_count == 0`, but party_count changes to 1 the moment the agent picks up Charmander. This allowed backtracking to fire immediately after the pickup, wiping out progress. Change guard to `map_id == 40` (entire lab is protected). Also revert Oak trigger to PR #10's proven brute-force approach (4 rounds of mash_a + wait) instead of script-state-aware gating that read 0xD5F1 while still on Pallet Town map where the address is meaningless. ROM test confirms: agent picks Charmander, wins rival battle, exits lab. * Add FLE backtracking section to README with paper reference Documents the Factorio Learning Environment-inspired backtracking system: snapshot/restore mechanics, evolvable parameters, and Oak's Lab guard. Adds FLE paper to references list.

bdougie added 4 commits March 9, 2026 22:13

Merge main (alpha-evolve harness) into beat-rival

eb66338

Document AlphaEvolve evolution loop and long-session mode in README

ee0b4ce

bdougie merged commit dfab7e6 into main Mar 10, 2026
1 check passed

bdougie mentioned this pull request Mar 10, 2026

Add FLE-style backtracking with AlphaEvolve integration #13

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get Charmander + beat rival#10

Get Charmander + beat rival#10
bdougie merged 4 commits intomainfrom
feat/beat-rival

bdougie commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdougie commented Mar 10, 2026

Summary

10-Agent Results

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant