Skip to content

docs(mcts): explain unconditional model call on done lanes#18

Merged
NorbertRop merged 1 commit into
mainfrom
docs/mcts-terminal-lane-rationale
May 25, 2026
Merged

docs(mcts): explain unconditional model call on done lanes#18
NorbertRop merged 1 commit into
mainfrom
docs/mcts-terminal-lane-rationale

Conversation

@NorbertRop
Copy link
Copy Markdown
Contributor

Summary

Closes the open promise from #15 ("MCTS terminal-state model eval ... deferred to a focused PR") — with an honest "we looked and it doesn't need fixing" rather than a code change.

The recurrent_fn passed to mctx evaluates the model on every batch lane, including ones where the env has terminated. On closer inspection:

  • mctx's search loop calls recurrent_fn with a fixed-shape batch. Under jit + vmap, the model forward is a single XLA op that runs on the whole batch; per-lane skipping would require dynamic shapes, which jit forbids.
  • Done-lane output is algorithmically inert: value is masked to 0 on the next line and discount is also 0, so terminal lanes contribute nothing to mctx's backup.
  • The "NaN risk" is theoretical for this repo — pgx terminal observations are well-defined final board states, not garbage memory.

So the unconditional model call is structurally correct, not a bug. Adding defensive zeros-substitution would violate "don't add validation for scenarios that can't happen" and add code with no algorithmic effect.

This PR adds a comment block at the top of recurrent_fn capturing the rationale so future readers (and future-me) don't re-litigate the question.

Test plan

  • No behavior change; existing tests untouched.
  • Diff is comments only.

🤖 Generated with Claude Code

The recurrent_fn passed to mctx evaluates the model on every batch
lane, even where the env has terminated. This was flagged in #15 as
potential wasted compute and a NaN risk, but on inspection the
behavior is structurally correct:

- mctx's search loop calls recurrent_fn with a fixed-shape batch.
- Under jit + vmap, the model forward is a single XLA op that runs
  on the whole batch; per-lane skipping would need dynamic shapes.
- Done-lane output is algorithmically inert: value is masked to 0
  on the next line and discount is also 0, so terminal lanes
  contribute nothing to mctx's backup.

Replace the open "deferred to a focused PR" promise with a comment
that captures the rationale so future readers don't re-litigate
this dead end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@NorbertRop NorbertRop merged commit 67d3036 into main May 25, 2026
1 check passed
@NorbertRop NorbertRop deleted the docs/mcts-terminal-lane-rationale branch May 26, 2026 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant