Skip to content

Get Charmander + beat rival#10

Merged
bdougie merged 4 commits intomainfrom
feat/beat-rival
Mar 10, 2026
Merged

Get Charmander + beat rival#10
bdougie merged 4 commits intomainfrom
feat/beat-rival

Conversation

@bdougie
Copy link
Contributor

@bdougie bdougie commented Mar 10, 2026

Summary

  • Fix Oak's Lab exit navigation so agents advance past Pokemon selection through the rival battle
  • Phased approach: A-mash dialogue (30 turns), move left to center column, walk south to door with NPC interaction
  • All 10 parallel agents beat the rival and exit the lab

10-Agent Results

Rank Label Score Map Party Stuck Turns Time
1 dc2 -65.0 0 1 0 650 3.5s
2 baseline_4dc -72.3 0 1 0 723 3.7s
3 low_stuck_dc4 -73.4 0 1 0 734 3.5s
4 narrow_dc4 -74.1 0 1 0 741 3.5s
5 aggressive -76.6 0 1 0 766 3.6s
6 x_axis_dc4 -79.2 0 1 0 792 3.7s
7 wide_skip_dc4 -80.5 0 1 0 805 3.8s
8 moderate -84.3 0 1 0 843 3.9s
9 high_stuck_dc4 -85.1 0 1 0 851 4.0s
10 original -91.7 0 1 0 917 4.2s

All agents: battles_won=1, maps_visited=4, party_size=1, final_map=0 (Pallet Town after lab exit)

Winner: dc2 (door_cooldown=2) — shortest door cooldown yields fastest completion.

Test plan

  • uv run pytest tests/ — all tests pass, 100% coverage maintained
  • 10-agent parallel run completes successfully
  • All agents beat rival battle (battles_won=1)

bdougie added 4 commits March 9, 2026 22:13
Implement fitness metrics, evolution loop, and parameter tuning
infrastructure so the agent can automatically improve its navigation
strategy through headless evaluation runs.

- Add compute_fitness() to PokemonAgent returning structured metrics
- Add --output-json CLI flag for programmatic fitness collection
- Make Navigator thresholds configurable (stuck_threshold, skip_distance)
- Read EVOLVE_PARAMS from environment to override navigator defaults
- Add observe_session_inline() for programmatic observation access
- Create evolve.py: evolution harness with LLM variant proposal,
  subprocess isolation, composite scoring, and observer integration
- Create run_10_agents.py: parallel multi-agent evaluation runner
- 367 tests, 100% coverage maintained

Closes #8
Agent was stuck at (7,5) in Oak's Lab after picking Charmander.
Added phased lab exit: A-mash dialogue (30 turns), move left to
center column, then south to door with NPC interaction. All 10
parallel agents now beat the rival (battles_won=1, party_size=1).

Winner: dc2 (door_cooldown=2, score=-65.0)
@bdougie bdougie merged commit dfab7e6 into main Mar 10, 2026
1 check passed
bdougie added a commit that referenced this pull request Mar 10, 2026
The backtrack guard checked `map_id == 40 AND party_count == 0`, but
party_count changes to 1 the moment the agent picks up Charmander.
This allowed backtracking to fire immediately after the pickup, wiping
out progress.  Change guard to `map_id == 40` (entire lab is protected).

Also revert Oak trigger to PR #10's proven brute-force approach (4 rounds
of mash_a + wait) instead of script-state-aware gating that read 0xD5F1
while still on Pallet Town map where the address is meaningless.

ROM test confirms: agent picks Charmander, wins rival battle, exits lab.
bdougie added a commit that referenced this pull request Mar 10, 2026
* Add FLE-style backtracking with AlphaEvolve integration

BacktrackManager saves/restores game state via PyBoy save_state/load_state
to escape stuck navigation on Route 1. Snapshots on map change and
periodically; restores when stuck_turns exceeds threshold.

Four new evolvable params (bt_max_snapshots, bt_restore_threshold,
bt_max_attempts, bt_snapshot_interval) flow through evolve.py and
run_10_agents.py with two new variants: aggressive_bt and no_bt.

* Remove unused field import and deduplicate score()

- Remove unused `field` import from dataclasses in agent.py
- Import `score()` from evolve.py in run_10_agents.py instead of duplicating it

* Fix backtrack restore: reset script-gate flags, skip duplicate snapshots

- Reset _oak_wait_done, _pallet_diag_done, _house_diag_done, _lab_phase,
  _lab_turns, _lab_exit_turns on backtrack restore so one-time game
  sequences (Oak encounter, lab phases) can re-trigger after restore
- Skip periodic snapshots when position matches the last snapshot to
  avoid poisoning the pool with stuck-adjacent positions

* Fix backtrack guard in Oak's Lab to prevent undoing Charmander pickup

The backtrack guard checked `map_id == 40 AND party_count == 0`, but
party_count changes to 1 the moment the agent picks up Charmander.
This allowed backtracking to fire immediately after the pickup, wiping
out progress.  Change guard to `map_id == 40` (entire lab is protected).

Also revert Oak trigger to PR #10's proven brute-force approach (4 rounds
of mash_a + wait) instead of script-state-aware gating that read 0xD5F1
while still on Pallet Town map where the address is meaningless.

ROM test confirms: agent picks Charmander, wins rival battle, exits lab.

* Add FLE backtracking section to README with paper reference

Documents the Factorio Learning Environment-inspired backtracking system:
snapshot/restore mechanics, evolvable parameters, and Oak's Lab guard.
Adds FLE paper to references list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant