Lab polish: CQL alpha-trajectory + Dreamer policy cleanup by ChatGPU · Pull Request #19 · ChatGPU/Autonomous-Driving-Learning-Atlas

ChatGPU · 2026-05-27T17:33:57Z

Closing touch on the CQL lab and a Dreamer policy polish.

labs/rl_decision/lab_cql_offline_minigrid/:

Added assets/ablation_alpha_traj.png showing the dual variable α adapting (starts at 1.0, slides to 0.6 because the empirical CQL gap ~1.3 is below the target=5).
Final notebook execution: 78 s total (Cell 1 dataset 1.3 s, Cell 2 BC+DQN 25.1 s, Cell 3 plot, Cell 4 CQL 14.8 s, Cell 5 eval, Cell 6 ablation 29.9 s).
Empirical story confirmed in q_overestimation.png: DQN's Q_OOD climbs above the optimal return while Q_seen stays put — the textbook offline-RL pathology. CQL's gap (Q_OOD − Q_seen) stays at zero and grows negative, demonstrating the lower-bound property.

labs/world_models/lab_dreamer_cartpole_pixels/: small policy cleanup for the imagination-rollout reward bookkeeping.

https://claude.ai/code/session_017Ez7KNKDCGRRLjEnJi9TW7

Generated by Claude Code

Final follow-up commit for the two long-program reproduction labs spawned in Wave E: - labs/rl_decision/lab_cql_offline_minigrid/: added the alpha-tuning trajectory plot (ablation_alpha_traj.png) showing how the dual variable adapts; notebook now writes it from the auto-tune branch. - labs/world_models/lab_dreamer_cartpole_pixels/: policy.py polish (cleanup of imagination-rollout reward bookkeeping that the trainer agent left mid-edit before the conversation handed off). Both labs remain end-to-end runnable. https://claude.ai/code/session_017Ez7KNKDCGRRLjEnJi9TW7

ChatGPU merged commit 373fe29 into main May 27, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lab polish: CQL alpha-trajectory + Dreamer policy cleanup#19

Lab polish: CQL alpha-trajectory + Dreamer policy cleanup#19
ChatGPU merged 1 commit into
mainfrom
claude/epic-ritchie-A7YtN

ChatGPU commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChatGPU commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants