Fix Atlas incremental-indexing precision (microsecond high-water binds) + null-state-token window by jpr5 · Pull Request #99 · CopilotKit/pathfinder

jpr5 · 2026-06-09T17:23:28Z

Summary

Follow-up to #98 — fixes two correctness bugs in Atlas incremental indexing that a dry-run/review surfaced.

Microsecond high-water precision (the silent-drop fix). getAtlasStateToken previously round-tripped the high-water mark through a millisecond JS Date, so the updated_at <= token bound dropped the newest row whenever its timestamp had sub-millisecond digits — permanently, until something else bumped it. Now the token is the raw microsecond text (to_char(... 'US')) and acquisition windows bind as $N::timestamptz text (never a JS Date), so Postgres compares at full microsecond precision: the high-water row is included exactly once and never double-fetched or dropped. (Removed the interim ceilStateTokenToMillis rounding entirely.)
Null-state-token window. incrementalAcquire no longer builds a guaranteed-empty > T AND <= T window when the high-water read returns null — it warns loudly and skips, carrying the (validated) checkpoint forward.
Fail-loud checkpoint guard. parseLowerBound validates lastStateToken against the exact emitted µs format and throws a context-bearing error on garbage, before the null-token early return (so a corrupt checkpoint can't silently pass through on an empty source).

Testing

New PGlite-independent bind-path test asserts the window binds as ::timestamptz text (no Date truncation), plus a two-run integration test proving include-once / no-double-fetch / no-drop at adjacent microseconds — the race is reproduced and proven gone.
Fail-loud guard tests (unparseable, JS-parseable-but-wrong-format, corrupt-token-on-empty-source); spy tests carry a positive-control assertion so ESM interception can't pass vacuously.
Full suite green (4503 tests); typecheck/build/prettier clean.

Review

Implementation + two unbiased 7-agent CR rounds → zero must-fix findings; the precision machinery and guards are confirmed correct.

Deferred — recommended deeper follow-up (NOT introduced here; pre-existing)

Reviewers repeatedly (and correctly) flag that the high-water token read and the content/removed reads are not a single transactional snapshot, and the token is a combined seed+cache high-water mark. Under concurrent writes or an exact same-microsecond cross-partition tie, a row can be skipped for a cycle. This is self-healed by the nightly unconditional full reindex and is not a regression from this change (microsecond precision makes ties strictly rarer). Closing it properly wants a transactional/repeatable-read snapshot around token+content reads, or per-source tokens — a worthwhile dedicated follow-up. Also pre-existing and out of scope here: repo-filtered orphan-cache-page removal, and clearAtlasCachePageStale({content:""}) split-state.

The state token is the microsecond high-water text of MAX(updated_at) (to_char(... 'US')), but the acquire queries bound results by turning that token into a JS Date, which only holds milliseconds. The bind truncated the sub-millisecond digits, so a row whose true updated_at landed past the truncated bound was either re-fetched or — once a ceiled token became the next run's strict-greater lower bound — permanently skipped. Return the raw microsecond high-water text verbatim (no millisecond ceil) and bind changedAfter/changedOnOrBefore as $N::timestamptz text params instead of JS Dates. Postgres parses the full microsecond text, so <= includes the high-water row exactly and the next run's > excludes only that exact row — no rounding, no double-fetch, no drop. AtlasContentQuery's changedAfter/ changedOnOrBefore become string (microsecond text); ceilStateTokenToMillis and its __testing export are removed. Tests prove the bind path carries the microsecond text as a ::timestamptz cast (not a truncating Date), that the token round-trips at microsecond precision, and that a boundary row is included in run 1 and neither dropped nor re-fetched in run 2.

…fail loud on garbage Carry the microsecond high-water text straight into the SQL bounds instead of wrapping it in new Date(), which would truncate the microseconds the db layer now preserves. The epoch fallback for an empty source is microsecond text too. Drop the dead "currentStateToken ? ... : undefined" ternary — currentStateToken is proven non-null by the null-token early return above. Guard a malformed lastStateToken before it can reach the $N::timestamptz bind: a non-empty token must match the EXACT microsecond shape getAtlasStateToken emits (YYYY-MM-DDTHH:MM:SS.ffffffZ), or it throws a context-bearing error (source name + offending token). A bare new Date() probe was too loose — "2026" or "Jan 5 2026" parse in JS but bind with different / locale-dependent semantics than Postgres ::timestamptz, defeating the fail-loud intent. An empty lastStateToken stays the legitimate first-run signal (no lower bound). Run that validation BEFORE the null-token early return so a corrupt checkpoint fails loud even when the source is empty, rather than silently passing through and re-persisting on every run. Tests cover the fail-loud guard (including JS-parseable-but-non-microsecond tokens and a corrupt token against an empty source) and the empty-first-run lower bound; the null-token loud-skip test is retained and gains a positive control asserting the state-token read actually fired so a broken ESM-binding interception can no longer pass the negative assertions vacuously.

jpr5 added 2 commits June 9, 2026 10:17

jpr5 merged commit 1db109d into main Jun 9, 2026
8 checks passed

jpr5 deleted the follow-up/atlas-acquisition-precision branch June 9, 2026 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Atlas incremental-indexing precision (microsecond high-water binds) + null-state-token window#99

Fix Atlas incremental-indexing precision (microsecond high-water binds) + null-state-token window#99
jpr5 merged 2 commits into
mainfrom
follow-up/atlas-acquisition-precision

jpr5 commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jpr5 commented Jun 9, 2026

Summary

Testing

Review

Deferred — recommended deeper follow-up (NOT introduced here; pre-existing)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant