Fix Atlas incremental-indexing precision (microsecond high-water binds) + null-state-token window#99
Merged
Merged
Conversation
The state token is the microsecond high-water text of MAX(updated_at) (to_char(... 'US')), but the acquire queries bound results by turning that token into a JS Date, which only holds milliseconds. The bind truncated the sub-millisecond digits, so a row whose true updated_at landed past the truncated bound was either re-fetched or — once a ceiled token became the next run's strict-greater lower bound — permanently skipped. Return the raw microsecond high-water text verbatim (no millisecond ceil) and bind changedAfter/changedOnOrBefore as $N::timestamptz text params instead of JS Dates. Postgres parses the full microsecond text, so <= includes the high-water row exactly and the next run's > excludes only that exact row — no rounding, no double-fetch, no drop. AtlasContentQuery's changedAfter/ changedOnOrBefore become string (microsecond text); ceilStateTokenToMillis and its __testing export are removed. Tests prove the bind path carries the microsecond text as a ::timestamptz cast (not a truncating Date), that the token round-trips at microsecond precision, and that a boundary row is included in run 1 and neither dropped nor re-fetched in run 2.
…fail loud on garbage Carry the microsecond high-water text straight into the SQL bounds instead of wrapping it in new Date(), which would truncate the microseconds the db layer now preserves. The epoch fallback for an empty source is microsecond text too. Drop the dead "currentStateToken ? ... : undefined" ternary — currentStateToken is proven non-null by the null-token early return above. Guard a malformed lastStateToken before it can reach the $N::timestamptz bind: a non-empty token must match the EXACT microsecond shape getAtlasStateToken emits (YYYY-MM-DDTHH:MM:SS.ffffffZ), or it throws a context-bearing error (source name + offending token). A bare new Date() probe was too loose — "2026" or "Jan 5 2026" parse in JS but bind with different / locale-dependent semantics than Postgres ::timestamptz, defeating the fail-loud intent. An empty lastStateToken stays the legitimate first-run signal (no lower bound). Run that validation BEFORE the null-token early return so a corrupt checkpoint fails loud even when the source is empty, rather than silently passing through and re-persisting on every run. Tests cover the fail-loud guard (including JS-parseable-but-non-microsecond tokens and a corrupt token against an empty source) and the empty-first-run lower bound; the null-token loud-skip test is retained and gains a positive control asserting the state-token read actually fired so a broken ESM-binding interception can no longer pass the negative assertions vacuously.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #98 — fixes two correctness bugs in Atlas incremental indexing that a dry-run/review surfaced.
getAtlasStateTokenpreviously round-tripped the high-water mark through a millisecond JSDate, so theupdated_at <= tokenbound dropped the newest row whenever its timestamp had sub-millisecond digits — permanently, until something else bumped it. Now the token is the raw microsecond text (to_char(... 'US')) and acquisition windows bind as$N::timestamptztext (never a JSDate), so Postgres compares at full microsecond precision: the high-water row is included exactly once and never double-fetched or dropped. (Removed the interimceilStateTokenToMillisrounding entirely.)incrementalAcquireno longer builds a guaranteed-empty> T AND <= Twindow when the high-water read returns null — it warns loudly and skips, carrying the (validated) checkpoint forward.parseLowerBoundvalidateslastStateTokenagainst the exact emitted µs format and throws a context-bearing error on garbage, before the null-token early return (so a corrupt checkpoint can't silently pass through on an empty source).Testing
::timestamptztext (noDatetruncation), plus a two-run integration test proving include-once / no-double-fetch / no-drop at adjacent microseconds — the race is reproduced and proven gone.Review
Deferred — recommended deeper follow-up (NOT introduced here; pre-existing)
Reviewers repeatedly (and correctly) flag that the high-water token read and the content/removed reads are not a single transactional snapshot, and the token is a combined seed+cache high-water mark. Under concurrent writes or an exact same-microsecond cross-partition tie, a row can be skipped for a cycle. This is self-healed by the nightly unconditional full reindex and is not a regression from this change (microsecond precision makes ties strictly rarer). Closing it properly wants a transactional/repeatable-read snapshot around token+content reads, or per-source tokens — a worthwhile dedicated follow-up. Also pre-existing and out of scope here: repo-filtered orphan-cache-page removal, and
clearAtlasCachePageStale({content:""})split-state.