You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The AI Checks driver shipped in #122 has been running on real PRs for a few weeks. The 2026-05-04 PR batch (#129/#131/#133) surfaced three distinct false-positive patterns that all share the same meta-shape: the model produces a verdict that references content not present in the actual file tree at the PR head. #125 already tracks one driver-hardening bundle (pass PR metadata, surface model used, fix the Feature PR Docs bundle check), but its scope and proposed fixes ("feed the model the branch name") don't address the patterns below — the model is hallucinating diff/file content, not missing PR metadata.
This issue collects those three reproducers as a separate driver-hardening bundle.
The PR branch is not up-to-date with the current main branch; it does not appear to be rebased onto main, which is required for compliance.
Ground truth: The branch was rebased onto main immediately before push. The runner's actions/checkout@v4 defaults to refs/pull/N/merge, which GitHub materializes as a two-parent merge commit even when the PR is fully fast-forwardable. From the run log:
fetch ... +e8b57f8...:refs/remotes/pull/129/merge
checkout refs/remotes/pull/129/merge
HEAD is now at e8b57f8 Merge 9d1897f... into 69a008...
The model inspects git log on the runner, sees a Merge X into Y commit on top, and concludes "not rebased." This fires on every rebased PR, regardless of whether main actually moved underneath the branch. (Was intermittent on the next run for #131 — model output is non-deterministic.)
This is a different mechanism from #125's premise ("the model can't see the branch name"). Even if PR metadata is passed in, the model is still reading the synthetic merge commit from git log on the checked-out tree.
2. Comment Discipline reads diff-removed lines as still present (#131)
Verdict (model):
There are several comments in the episodes/apps.py file that paraphrase what the code does […] For example, comments like # Only initialize DBOS for server/worker entrypoints, plus the management commands that need to talk to the queue. and # Decorators must register before DBOS.launch(); enqueues against late-registered workflows silently fail. can be removed because the functionality is clear from the code itself.
Ground truth: The first comment cited (# Only initialize DBOS for server/worker entrypoints…) was deleted by this PR — it appears in the unified diff as a - line. The model is treating - lines as if they're still present in the file. The second comment is a legitimate non-obvious WHY (silent-failure mode of late-registered DBOS workflows) that the agent kept deliberately per the rule's own "keep WHY" carve-out.
The driver currently passes git diff $BASE_REF…HEAD directly to the model. There's no signal that - means "removed" beyond the standard unified-diff convention, which gpt-4o-mini regularly mishandles on multi-hunk diffs.
3. RAGTIME_* Env Var Sync invents env var names from PR title (#133)
Verdict artifact:
{
\"verdict\": \"fail\",\"summary\": \"New environment variables were added but .env.sample and configure.py are not updated accordingly.\",\"details\": \"The code introduces the variable RAGTIME_SHOW_NAME, which is not referenced in .env.sample or core/management/commands/configure.py.\"
}
RAGTIME_SHOW_NAME does not exist anywhere in the branch. The PR adds Episode.show_name (a Django model field). The model invented an env var name by combining the rule's RAGTIME_ prefix with the PR's branch/title token show_name. Title-priming hallucination — the same mechanism that produced the #122-era Feature PR Docs confabulation, but on a different rule.
4. Pipeline Step Documentation Sync misreads rule scope and claims doc isn't updated when it is (#133, post-rebase)
Verdict (model):
The diff adds a new field show_name to the Episode model in episodes/models.py and modifies the download process, but it does not update the documentation in README.md or doc/README.md to reflect these changes.
Two problems, layered:
Diff-reading miss.doc/README.mdis updated in the PR — show_name is added to the details field list and the overwrite-semantics paragraph gains an additive-only exception. The model claims the file isn't touched. This is the inverse of pattern Update README Features with emojis and Episode Indexing #2: instead of reading - lines as still-present, it misses + lines that are present.
Rule-scope misread. Even if README.md lacked any mention of show_name, the rule wouldn't require it. AGENTS.md scopes the rule to "When adding, removing, or changing a pipeline step, update the summary table in README.md and the detailed step descriptions in doc/README.md." This PR doesn't add/remove/change a pipeline step — the steps list is unchanged; only the behavior inside the existing fetch_details step changed. The README summary table at the step-level granularity ("🕷️ Fetch Details ... extracts metadata") is correct and shouldn't enumerate every field. The model expanded "changing a step" to mean "any change touching a step's source file."
This is a scope-creep + diff-miss double hallucination. The driver-side fix that addresses pattern #2 (extracting added vs. removed hunks before handing to the model) helps here too, but the rule-scope-misread component is independent — better-prompted rule guidance or a stronger model is needed for that half.
#125's proposal is to pass more context into the prompt (PR metadata, model name in summary, telemetry). That's strictly additive — none of it tells the model "do not invent file content that isn't in the diff." The patterns above are about the model fabricating or misreading the input it already has:
All four share "the verdict references content not present in the PR" or "the verdict treats content as absent that is present." Worth a coordinated bundle.
Why we're not just upgrading the default model
Reasonable counter-question: if all three failures look like small-model hallucinations, why not just bump the default from openai/gpt-4o-mini to a Sonnet-class model and call it a day? Per-pattern susceptibility:
Read git log and didn't realize HEAD was a synthetic GHA merge ref, not the PR head
Medium — GPT-4o / Sonnet 4.6 often recognize refs/pull/N/merge semantics, but it's an environmental ambiguity even strong models can trip on. Driver fix (proposal A) eliminates regardless of model.
Claimed doc/README.md wasn't updated when the diff clearly adds lines to it; also stretched "changing a pipeline step" to mean "any change touching a step's source file"
High for the diff-miss half; Medium for the scope-creep half (rule prompt could be tightened independently of model strength).
So 3 of 4 are model-capacity issues a Sonnet-class model would almost certainly avoid; #1 is a mixed environmental + model issue that proposal A fixes regardless of model.
Why not wholesale-upgrade anyway? Cost. Sonnet 4.6 is ~10–20× the per-PR cost of gpt-4o-mini, multiplied across 9 rules × every PR. The framing of the proposals below preserves the cheap default for the routine pass case:
Proposal B (deterministic prechecks) makes 2 of 3 patterns disappear without spending any model tokens — pure regex on the diff.
Proposal C (auto-retry on stronger model when first verdict is fail) bounds the cost: pay Sonnet rates only when there's a verdict worth double-checking. False-positive rate drops, true-negative rate stays cheap.
Proposal D (telemetry on overturned verdicts) gives us the data to revisit the default-model choice in 3–6 months with evidence rather than vibes.
If telemetry shows the cheap default has an unacceptable false-positive rate even after B and C land, that's the moment to bump the default — not now.
Proposal
A. Pin actions/checkout to the PR head SHA (fixes #1)
Set ref: \${{ github.event.pull_request.head.sha }} on the check job's actions/checkout step. The runner will then check out 9d1897f directly (the rebased head) instead of the synthetic refs/pull/N/merge commit. git log on the runner reflects the PR's actual linear history.
Caveat: the diff is still computed via git diff \$BASE_REF…HEAD, where \$BASE_REF is fetched separately — this part is unaffected.
B. Deterministic precheck for path-/symbol-based rules (fixes #2 and #3, plus extends #125's Option B)
For rules whose verdict rests on "does symbol/path X appear in the diff?", do the check in Python before the model call:
Rule
Precheck
ragtime-env-var-sync
Regex \b(RAGTIME_[A-Z_]+)\b against added lines in the diff; pass the resulting set to the model as Detected new RAGTIME_* references: {…} (empty set ⇒ verdict pass deterministically, no model call needed).
Extract added comment lines (^\+\s*#) from the diff, hand the model only those lines as a list. Strips diff-format ambiguity entirely.
pipeline-step-doc-sync
Two-part precheck: (a) detect whether the PR actually adds/removes/changes a pipeline step (regex on _PIPELINE_DISPATCH, Episode.Status choices, or new @DBOS.step decorators) — if not, deterministic pass with summary "PR does not modify the pipeline-step list." (b) If a step is modified, pre-extract whether README.md and doc/README.md appear in the diff's changed-files list and pass that fact to the model so it can't claim a touched file is untouched.
This is more invasive than #125 Option B because it touches four rules instead of one, but each is mechanical (one regex per rule). Worth doing as one bundle.
C. Fall back to a stronger model for any rule that fails (optional)
If a rule's first-pass verdict is fail with the cheap default (openai/gpt-4o-mini), automatically retry with openai/gpt-4o or anthropic/claude-sonnet-4-6. If the second model also says fail, post that verdict; if it says pass, the original was a hallucination — log it and pass.
Cost is bounded: only failures retry, so true negatives stay cheap. Helps catch any hallucination pattern not specifically deterministic-precheckable.
D. Telemetry to detect ongoing false-positive rate
Already partially covered by #125 ("surface which model was used"). Add: when a verdict is overturned by either #C or by a maintainer comment, log the rule + model + verdict for offline review. Six months from now, this is the data we'd need to decide whether to keep the cheap-model default.
Acceptance criteria
Open a fresh rebased PR with no real rule violations. Branching & PR Strategy passes deterministically.
PR that deletes a comment (no comment additions) passes Comment Discipline deterministically.
PR that touches a model field whose name happens to match a RAGTIME_* pattern (e.g. a hypothetical Episode.api_key field) passes RAGTIME_* Env Var Sync deterministically.
PR that touches a pipeline-step file but doesn't add/remove/change a step in _PIPELINE_DISPATCH passes Pipeline Step Documentation Sync deterministically.
PR that changes a pipeline step and updates only doc/README.md (not README.md) is correctly judged based on whether the README summary table needs updating — i.e., the model sees the actual diff, not a hallucinated absence.
Background
The AI Checks driver shipped in #122 has been running on real PRs for a few weeks. The 2026-05-04 PR batch (#129/#131/#133) surfaced three distinct false-positive patterns that all share the same meta-shape: the model produces a verdict that references content not present in the actual file tree at the PR head. #125 already tracks one driver-hardening bundle (pass PR metadata, surface model used, fix the Feature PR Docs bundle check), but its scope and proposed fixes ("feed the model the branch name") don't address the patterns below — the model is hallucinating diff/file content, not missing PR metadata.
This issue collects those three reproducers as a separate driver-hardening bundle.
Reproducers
1.
Branching & PR Strategyreads the synthetic merge commit (#129)Verdict (model):
Ground truth: The branch was rebased onto
mainimmediately before push. The runner'sactions/checkout@v4defaults torefs/pull/N/merge, which GitHub materializes as a two-parent merge commit even when the PR is fully fast-forwardable. From the run log:The model inspects
git logon the runner, sees aMerge X into Ycommit on top, and concludes "not rebased." This fires on every rebased PR, regardless of whethermainactually moved underneath the branch. (Was intermittent on the next run for #131 — model output is non-deterministic.)This is a different mechanism from #125's premise ("the model can't see the branch name"). Even if PR metadata is passed in, the model is still reading the synthetic merge commit from
git logon the checked-out tree.Detail: PR #129 comment
2.
Comment Disciplinereads diff-removed lines as still present (#131)Verdict (model):
Ground truth: The first comment cited (
# Only initialize DBOS for server/worker entrypoints…) was deleted by this PR — it appears in the unified diff as a-line. The model is treating-lines as if they're still present in the file. The second comment is a legitimate non-obvious WHY (silent-failure mode of late-registered DBOS workflows) that the agent kept deliberately per the rule's own "keep WHY" carve-out.The driver currently passes
git diff $BASE_REF…HEADdirectly to the model. There's no signal that-means "removed" beyond the standard unified-diff convention, which gpt-4o-mini regularly mishandles on multi-hunk diffs.Detail: verdict artifact Job 74235387227
3.
RAGTIME_* Env Var Syncinvents env var names from PR title (#133)Verdict artifact:
{ \"verdict\": \"fail\", \"summary\": \"New environment variables were added but .env.sample and configure.py are not updated accordingly.\", \"details\": \"The code introduces the variable RAGTIME_SHOW_NAME, which is not referenced in .env.sample or core/management/commands/configure.py.\" }Ground truth:
RAGTIME_SHOW_NAMEdoes not exist anywhere in the branch. The PR addsEpisode.show_name(a Django model field). The model invented an env var name by combining the rule'sRAGTIME_prefix with the PR's branch/title tokenshow_name. Title-priming hallucination — the same mechanism that produced the #122-eraFeature PR Docsconfabulation, but on a different rule.Detail: PR #133 comment and verdict artifact
4.
Pipeline Step Documentation Syncmisreads rule scope and claims doc isn't updated when it is (#133, post-rebase)Verdict (model):
Two problems, layered:
doc/README.mdis updated in the PR —show_nameis added to thedetailsfield list and the overwrite-semantics paragraph gains an additive-only exception. The model claims the file isn't touched. This is the inverse of pattern Update README Features with emojis and Episode Indexing #2: instead of reading-lines as still-present, it misses+lines that are present.README.mdlacked any mention ofshow_name, the rule wouldn't require it. AGENTS.md scopes the rule to "When adding, removing, or changing a pipeline step, update the summary table inREADME.mdand the detailed step descriptions indoc/README.md." This PR doesn't add/remove/change a pipeline step — the steps list is unchanged; only the behavior inside the existing fetch_details step changed. The README summary table at the step-level granularity ("🕷️ Fetch Details ... extracts metadata") is correct and shouldn't enumerate every field. The model expanded "changing a step" to mean "any change touching a step's source file."This is a scope-creep + diff-miss double hallucination. The driver-side fix that addresses pattern #2 (extracting added vs. removed hunks before handing to the model) helps here too, but the rule-scope-misread component is independent — better-prompted rule guidance or a stronger model is needed for that half.
Detail: PR #133 — Pipeline Step Documentation Sync verdict
Why this is separate from #125
#125's proposal is to pass more context into the prompt (PR metadata, model name in summary, telemetry). That's strictly additive — none of it tells the model "do not invent file content that isn't in the diff." The patterns above are about the model fabricating or misreading the input it already has:
actions/checkoutref).-lines as still present (driver-side fix: pre-process the diff into added/removed sections, or use a stronger model).+lines that are in the diff (driver-side fix: same diff pre-processing as Update README Features with emojis and Episode Indexing #2 + a per-rule "applicability check" precheck before scoring; stronger model also helps).All four share "the verdict references content not present in the PR" or "the verdict treats content as absent that is present." Worth a coordinated bundle.
Why we're not just upgrading the default model
Reasonable counter-question: if all three failures look like small-model hallucinations, why not just bump the default from
openai/gpt-4o-minito a Sonnet-class model and call it a day? Per-pattern susceptibility:git logand didn't realize HEAD was a synthetic GHA merge ref, not the PR headrefs/pull/N/mergesemantics, but it's an environmental ambiguity even strong models can trip on. Driver fix (proposal A) eliminates regardless of model.-lines+/-correctly is basic comprehension;gpt-4o-minispecifically gets sloppy on multi-hunk diffs.RAGTIME_prefix with the PR title tokenshow_nameand produced a fact+lines + scope-creepdoc/README.mdwasn't updated when the diff clearly adds lines to it; also stretched "changing a pipeline step" to mean "any change touching a step's source file"So 3 of 4 are model-capacity issues a Sonnet-class model would almost certainly avoid; #1 is a mixed environmental + model issue that proposal A fixes regardless of model.
Why not wholesale-upgrade anyway? Cost. Sonnet 4.6 is ~10–20× the per-PR cost of
gpt-4o-mini, multiplied across 9 rules × every PR. The framing of the proposals below preserves the cheap default for the routine pass case:fail) bounds the cost: pay Sonnet rates only when there's a verdict worth double-checking. False-positive rate drops, true-negative rate stays cheap.If telemetry shows the cheap default has an unacceptable false-positive rate even after B and C land, that's the moment to bump the default — not now.
Proposal
A. Pin
actions/checkoutto the PR head SHA (fixes #1)Set
ref: \${{ github.event.pull_request.head.sha }}on thecheckjob'sactions/checkoutstep. The runner will then check out9d1897fdirectly (the rebased head) instead of the syntheticrefs/pull/N/mergecommit.git logon the runner reflects the PR's actual linear history.Caveat: the diff is still computed via
git diff \$BASE_REF…HEAD, where\$BASE_REFis fetched separately — this part is unaffected.B. Deterministic precheck for path-/symbol-based rules (fixes #2 and #3, plus extends #125's Option B)
For rules whose verdict rests on "does symbol/path X appear in the diff?", do the check in Python before the model call:
ragtime-env-var-sync\b(RAGTIME_[A-Z_]+)\bagainst added lines in the diff; pass the resulting set to the model asDetected new RAGTIME_* references: {…}(empty set ⇒ verdictpassdeterministically, no model call needed).feature-pr-docscomment-discipline^\+\s*#) from the diff, hand the model only those lines as a list. Strips diff-format ambiguity entirely.pipeline-step-doc-sync_PIPELINE_DISPATCH,Episode.Statuschoices, or new@DBOS.stepdecorators) — if not, deterministicpasswith summary "PR does not modify the pipeline-step list." (b) If a step is modified, pre-extract whetherREADME.mdanddoc/README.mdappear in the diff's changed-files list and pass that fact to the model so it can't claim a touched file is untouched.This is more invasive than #125 Option B because it touches four rules instead of one, but each is mechanical (one regex per rule). Worth doing as one bundle.
C. Fall back to a stronger model for any rule that fails (optional)
If a rule's first-pass verdict is
failwith the cheap default (openai/gpt-4o-mini), automatically retry withopenai/gpt-4ooranthropic/claude-sonnet-4-6. If the second model also saysfail, post that verdict; if it sayspass, the original was a hallucination — log it and pass.Cost is bounded: only failures retry, so true negatives stay cheap. Helps catch any hallucination pattern not specifically deterministic-precheckable.
D. Telemetry to detect ongoing false-positive rate
Already partially covered by #125 ("surface which model was used"). Add: when a verdict is overturned by either #C or by a maintainer comment, log the rule + model + verdict for offline review. Six months from now, this is the data we'd need to decide whether to keep the cheap-model default.
Acceptance criteria
Branching & PR Strategypasses deterministically.Comment Disciplinedeterministically.RAGTIME_*pattern (e.g. a hypotheticalEpisode.api_keyfield) passesRAGTIME_* Env Var Syncdeterministically._PIPELINE_DISPATCHpassesPipeline Step Documentation Syncdeterministically.doc/README.md(notREADME.md) is correctly judged based on whether the README summary table needs updating — i.e., the model sees the actual diff, not a hallucinated absence.Out of scope
Depends on / relates to