Fix atlas state-token precision stranding approved seeds by jpr5 · Pull Request #104 · CopilotKit/pathfinder

jpr5 · 2026-06-11T02:00:53Z

Summary

getAtlasStateToken returns MAX(updated_at) from the atlas tables, but the value round-trips through a JS Date, truncating PostgreSQL's microsecond precision to milliseconds (e.g. …28.78683 → …28.786Z). The incremental acquire path then bounds with updated_at > $lastToken AND updated_at <= $newToken at full microsecond precision, so the row that produced the token falls in the sub-millisecond gap: it fails updated_at <= token in the run that generated the token, and every later run bounds > token AND <= token — the row is permanently stranded. Symptom: approved seeds never get indexed by incremental reindex, silently ("Indexing complete" with 0 items).

Fix: compare with date_trunc('milliseconds', updated_at) on BOTH bound clauses in addUpdatedAtClauses, so comparison precision matches token precision. This is the minimal correct surface: truncating inside MAX() in getAtlasStateToken would be redundant (the JS Date round-trip already truncates the token) and would not by itself close the comparison gap — the mismatch only matters where the bounds are evaluated.

Found by the atlas sandbox live run; hot-patch validated there end-to-end before this PR (seed approved at 01:46:28.78683, token 2026-06-11T01:46:28.786Z, 0 chunks indexed before the patch; seed indexed with 1 chunk and a real search hit after).

Test plan

New red-green regression test in src/__tests__/atlas-db.test.ts: approved seed with updated_at = '2026-06-11T01:46:28.786830+00', token captured, incremental bounds applied. Red against unfixed code: AssertionError: expected [] to deeply equal [ 'micro' ]. Green after the fix.
Second-cycle assertion pins the LOWER bound: querying (token, token] must return []. Red-verified against an un-truncated updated_at > token (boundary row re-emits forever) before restoring the fix.
Recovery-path assertion: a fullAcquire-shaped query (changedOnOrBefore only) includes the previously-stranded row — pins the deploy-step recovery mechanism in code.
resolveAtlasStateToken unit tests: throws on an unparseable non-null MAX(updated_at) (fail loud instead of silently shrinking the window), null on empty tables, max-of-maxes as ms ISO string.
Full suite: 324 files, 5933/5933 passing
tsc --noEmit on both tsconfigs (tsconfig.json, tsconfig.scripts.json)
npm run build clean

Deploy step (one-time, REQUIRED)

The bound fix is forward-only: a prod row stranded by the pre-fix bug has date_trunc('milliseconds', updated_at) exactly equal to the persisted token, and the fixed lower bound is strict (> token), so the already-stranded row stays excluded forever with all signals green — deploying the code alone does not heal it. Clearing the persisted state token forces one full re-acquire (idempotent re-index), which bounds only with changedOnOrBefore and picks the stranded rows back up.

Note the wall-clock time, then run (idle-gated: a run in flight reads state before and writes the token after this UPDATE, which would silently cancel the clear):

UPDATE index_state
SET last_commit_sha = NULL
WHERE source_type = 'atlas'
  AND status <> 'indexing'
RETURNING source_key, status;

Every atlas row must come back in RETURNING. If any row was skipped (a run was in flight), wait for it to finish and re-run the UPDATE until all atlas rows are returned.

Confirm the clear took effect before the next orchestrator run:

SELECT source_key, last_commit_sha, status
FROM index_state WHERE source_type = 'atlas';
-- expect: last_commit_sha IS NULL on every row

After the next orchestrator run, verify recovery actually ran:

SELECT source_key, last_commit_sha, status, last_indexed_at
FROM index_state WHERE source_type = 'atlas';
-- expect: repopulated last_commit_sha, status = 'idle', AND
-- last_indexed_at LATER than the time you ran the UPDATE above
-- (proves the full re-acquire ran after the clear, not pre-deploy
-- state surviving a clobber).

This step is part of this PR's definition of done — whoever merges deploys and runs it in the same motion.

…oken

jpr5 added 4 commits June 10, 2026 19:00

Fix atlas state-token precision stranding approved seeds

e2028c1

Harden the state-token fix: recovery path, bound pinning, fail-loud t…

99d74fc

…oken

Pin resolver semantics in the testing-surface comment

489e5ec

Pin the mixed null-plus-garbage state-token shape

c031857

jpr5 merged commit 8b94c8d into main Jun 11, 2026
6 checks passed

jpr5 deleted the fix/atlas-state-token-precision branch June 11, 2026 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix atlas state-token precision stranding approved seeds#104

Fix atlas state-token precision stranding approved seeds#104
jpr5 merged 4 commits into
mainfrom
fix/atlas-state-token-precision

jpr5 commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jpr5 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Deploy step (one-time, REQUIRED)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jpr5 commented Jun 11, 2026 •

edited

Loading