This repository is a curated code and results snapshot for the CV project "World-Model Inputs for Atari Policies."
The project tested whether an Atari policy can perform better when it receives additional, detached summary predictions from a world model trained alongside the policy, in addition to raw game observations. The main experiment covers 26 Atari100K games with 2 seeds each.
Latest synchronized local report: 2026-05-06 14:05:46 +08:00.
- Main clean100k run: 52 / 52 tasks completed.
- Mean HNS over 26 games: 1.875, compared with 1.818 for the EAWM Table 1 reference.
- Median HNS over 26 games: 0.959, compared with 0.773 for the EAWM Table 1 reference.
- Mean per-game raw-score change versus the reference: -0.01%.
- Interpretation: the average result is close to the baseline, but the effect is highly game-dependent. Some games improve substantially, while others degrade substantially, so this snapshot should not be read as a stable positive result.
Primary report:
third_party/EAWM/: curated copy of the EAWM code paths used for the experiments.runs/: local and Spartan launch/config utilities.manifests/: experiment manifests for clean100k, pooling, gated, and concat-gated variants.spartan_tools/: scripts used to scan Spartan outputs and build the Focus82 report.results/focus82/: synchronized report files, task table, and per-game comparison table.docs/CV_PROJECT_SUMMARY.md: short project summary aligned with the CV description.docs/SPARTAN_SYNC_STATUS.md: provenance and sync status for the included result snapshot.
The implementation builds on the EAWM codebase and local Spartan patches used for the Atari100K world-model-summary experiments. The included results are lightweight reports derived from Spartan output scans, not full checkpoints or raw training logs.
Large artifacts such as checkpoints, W&B media, raw output directories, and visualization GIFs are intentionally excluded from this curated public snapshot.