docs(mcts): explain unconditional model call on done lanes by NorbertRop · Pull Request #18 · bards-ai/Jaxpot

NorbertRop · 2026-05-25T21:48:31Z

Summary

Closes the open promise from #15 ("MCTS terminal-state model eval ... deferred to a focused PR") — with an honest "we looked and it doesn't need fixing" rather than a code change.

The recurrent_fn passed to mctx evaluates the model on every batch lane, including ones where the env has terminated. On closer inspection:

mctx's search loop calls recurrent_fn with a fixed-shape batch. Under jit + vmap, the model forward is a single XLA op that runs on the whole batch; per-lane skipping would require dynamic shapes, which jit forbids.
Done-lane output is algorithmically inert: value is masked to 0 on the next line and discount is also 0, so terminal lanes contribute nothing to mctx's backup.
The "NaN risk" is theoretical for this repo — pgx terminal observations are well-defined final board states, not garbage memory.

So the unconditional model call is structurally correct, not a bug. Adding defensive zeros-substitution would violate "don't add validation for scenarios that can't happen" and add code with no algorithmic effect.

This PR adds a comment block at the top of recurrent_fn capturing the rationale so future readers (and future-me) don't re-litigate the question.

Test plan

No behavior change; existing tests untouched.
Diff is comments only.

🤖 Generated with Claude Code

The recurrent_fn passed to mctx evaluates the model on every batch lane, even where the env has terminated. This was flagged in #15 as potential wasted compute and a NaN risk, but on inspection the behavior is structurally correct: - mctx's search loop calls recurrent_fn with a fixed-shape batch. - Under jit + vmap, the model forward is a single XLA op that runs on the whole batch; per-lane skipping would need dynamic shapes. - Done-lane output is algorithmically inert: value is masked to 0 on the next line and discount is also 0, so terminal lanes contribute nothing to mctx's backup. Replace the open "deferred to a focused PR" promise with a comment that captures the rationale so future readers don't re-litigate this dead end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

NorbertRop merged commit 67d3036 into main May 25, 2026
1 check passed

NorbertRop deleted the docs/mcts-terminal-lane-rationale branch May 26, 2026 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(mcts): explain unconditional model call on done lanes#18

docs(mcts): explain unconditional model call on done lanes#18
NorbertRop merged 1 commit into
mainfrom
docs/mcts-terminal-lane-rationale

NorbertRop commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NorbertRop commented May 25, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant