Skip to content

Lab polish: CQL alpha-trajectory + Dreamer policy cleanup#19

Merged
ChatGPU merged 1 commit into
mainfrom
claude/epic-ritchie-A7YtN
May 27, 2026
Merged

Lab polish: CQL alpha-trajectory + Dreamer policy cleanup#19
ChatGPU merged 1 commit into
mainfrom
claude/epic-ritchie-A7YtN

Conversation

@ChatGPU
Copy link
Copy Markdown
Owner

@ChatGPU ChatGPU commented May 27, 2026

Closing touch on the CQL lab and a Dreamer policy polish.

labs/rl_decision/lab_cql_offline_minigrid/:

  • Added assets/ablation_alpha_traj.png showing the dual variable α adapting (starts at 1.0, slides to 0.6 because the empirical CQL gap ~1.3 is below the target=5).
  • Final notebook execution: 78 s total (Cell 1 dataset 1.3 s, Cell 2 BC+DQN 25.1 s, Cell 3 plot, Cell 4 CQL 14.8 s, Cell 5 eval, Cell 6 ablation 29.9 s).
  • Empirical story confirmed in q_overestimation.png: DQN's Q_OOD climbs above the optimal return while Q_seen stays put — the textbook offline-RL pathology. CQL's gap (Q_OOD − Q_seen) stays at zero and grows negative, demonstrating the lower-bound property.

labs/world_models/lab_dreamer_cartpole_pixels/: small policy cleanup for the imagination-rollout reward bookkeeping.

https://claude.ai/code/session_017Ez7KNKDCGRRLjEnJi9TW7


Generated by Claude Code

Final follow-up commit for the two long-program reproduction labs
spawned in Wave E:

- labs/rl_decision/lab_cql_offline_minigrid/: added the alpha-tuning
  trajectory plot (ablation_alpha_traj.png) showing how the dual
  variable adapts; notebook now writes it from the auto-tune branch.
- labs/world_models/lab_dreamer_cartpole_pixels/: policy.py polish
  (cleanup of imagination-rollout reward bookkeeping that the trainer
  agent left mid-edit before the conversation handed off).

Both labs remain end-to-end runnable.

https://claude.ai/code/session_017Ez7KNKDCGRRLjEnJi9TW7
@ChatGPU ChatGPU merged commit 373fe29 into main May 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants