Skip to content

Fix Atlas incremental-indexing precision (microsecond high-water binds) + null-state-token window#99

Merged
jpr5 merged 2 commits into
mainfrom
follow-up/atlas-acquisition-precision
Jun 9, 2026
Merged

Fix Atlas incremental-indexing precision (microsecond high-water binds) + null-state-token window#99
jpr5 merged 2 commits into
mainfrom
follow-up/atlas-acquisition-precision

Conversation

@jpr5

@jpr5 jpr5 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #98 — fixes two correctness bugs in Atlas incremental indexing that a dry-run/review surfaced.

  • Microsecond high-water precision (the silent-drop fix). getAtlasStateToken previously round-tripped the high-water mark through a millisecond JS Date, so the updated_at <= token bound dropped the newest row whenever its timestamp had sub-millisecond digits — permanently, until something else bumped it. Now the token is the raw microsecond text (to_char(... 'US')) and acquisition windows bind as $N::timestamptz text (never a JS Date), so Postgres compares at full microsecond precision: the high-water row is included exactly once and never double-fetched or dropped. (Removed the interim ceilStateTokenToMillis rounding entirely.)
  • Null-state-token window. incrementalAcquire no longer builds a guaranteed-empty > T AND <= T window when the high-water read returns null — it warns loudly and skips, carrying the (validated) checkpoint forward.
  • Fail-loud checkpoint guard. parseLowerBound validates lastStateToken against the exact emitted µs format and throws a context-bearing error on garbage, before the null-token early return (so a corrupt checkpoint can't silently pass through on an empty source).

Testing

  • New PGlite-independent bind-path test asserts the window binds as ::timestamptz text (no Date truncation), plus a two-run integration test proving include-once / no-double-fetch / no-drop at adjacent microseconds — the race is reproduced and proven gone.
  • Fail-loud guard tests (unparseable, JS-parseable-but-wrong-format, corrupt-token-on-empty-source); spy tests carry a positive-control assertion so ESM interception can't pass vacuously.
  • Full suite green (4503 tests); typecheck/build/prettier clean.

Review

  • Implementation + two unbiased 7-agent CR rounds → zero must-fix findings; the precision machinery and guards are confirmed correct.

Deferred — recommended deeper follow-up (NOT introduced here; pre-existing)

Reviewers repeatedly (and correctly) flag that the high-water token read and the content/removed reads are not a single transactional snapshot, and the token is a combined seed+cache high-water mark. Under concurrent writes or an exact same-microsecond cross-partition tie, a row can be skipped for a cycle. This is self-healed by the nightly unconditional full reindex and is not a regression from this change (microsecond precision makes ties strictly rarer). Closing it properly wants a transactional/repeatable-read snapshot around token+content reads, or per-source tokens — a worthwhile dedicated follow-up. Also pre-existing and out of scope here: repo-filtered orphan-cache-page removal, and clearAtlasCachePageStale({content:""}) split-state.

jpr5 added 2 commits June 9, 2026 10:17
The state token is the microsecond high-water text of MAX(updated_at)
(to_char(... 'US')), but the acquire queries bound results by turning that
token into a JS Date, which only holds milliseconds. The bind truncated the
sub-millisecond digits, so a row whose true updated_at landed past the
truncated bound was either re-fetched or — once a ceiled token became the next
run's strict-greater lower bound — permanently skipped.

Return the raw microsecond high-water text verbatim (no millisecond ceil) and
bind changedAfter/changedOnOrBefore as $N::timestamptz text params instead of
JS Dates. Postgres parses the full microsecond text, so <= includes the
high-water row exactly and the next run's > excludes only that exact row — no
rounding, no double-fetch, no drop. AtlasContentQuery's changedAfter/
changedOnOrBefore become string (microsecond text); ceilStateTokenToMillis and
its __testing export are removed.

Tests prove the bind path carries the microsecond text as a ::timestamptz cast
(not a truncating Date), that the token round-trips at microsecond precision,
and that a boundary row is included in run 1 and neither dropped nor re-fetched
in run 2.
…fail loud on garbage

Carry the microsecond high-water text straight into the SQL bounds instead of
wrapping it in new Date(), which would truncate the microseconds the db layer
now preserves. The epoch fallback for an empty source is microsecond text too.

Drop the dead "currentStateToken ? ... : undefined" ternary — currentStateToken
is proven non-null by the null-token early return above. Guard a malformed
lastStateToken before it can reach the $N::timestamptz bind: a non-empty token
must match the EXACT microsecond shape getAtlasStateToken emits
(YYYY-MM-DDTHH:MM:SS.ffffffZ), or it throws a context-bearing error (source name
+ offending token). A bare new Date() probe was too loose — "2026" or
"Jan 5 2026" parse in JS but bind with different / locale-dependent semantics
than Postgres ::timestamptz, defeating the fail-loud intent. An empty
lastStateToken stays the legitimate first-run signal (no lower bound).

Run that validation BEFORE the null-token early return so a corrupt checkpoint
fails loud even when the source is empty, rather than silently passing through
and re-persisting on every run.

Tests cover the fail-loud guard (including JS-parseable-but-non-microsecond
tokens and a corrupt token against an empty source) and the empty-first-run
lower bound; the null-token loud-skip test is retained and gains a positive
control asserting the state-token read actually fired so a broken
ESM-binding interception can no longer pass the negative assertions vacuously.
@jpr5 jpr5 merged commit 1db109d into main Jun 9, 2026
8 checks passed
@jpr5 jpr5 deleted the follow-up/atlas-acquisition-precision branch June 9, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant