AI checks driver: hallucination patterns not covered by #125 (4 reproducers)

## Background

The AI Checks driver shipped in #122 has been running on real PRs for a few weeks. The 2026-05-04 PR batch (#129/#131/#133) surfaced three distinct false-positive patterns that all share the same meta-shape: **the model produces a verdict that references content not present in the actual file tree at the PR head**. #125 already tracks one driver-hardening bundle (pass PR metadata, surface model used, fix the Feature PR Docs bundle check), but its scope and proposed fixes (\"feed the model the branch name\") don't address the patterns below — the model is hallucinating diff/file content, not missing PR metadata.

This issue collects those three reproducers as a separate driver-hardening bundle.

## Reproducers

### 1. `Branching & PR Strategy` reads the synthetic merge commit (#129)

**Verdict (model):**
> The PR branch is not up-to-date with the current `main` branch; it does not appear to be rebased onto `main`, which is required for compliance.

**Ground truth:** The branch was rebased onto `main` immediately before push. The runner's `actions/checkout@v4` defaults to `refs/pull/N/merge`, which GitHub materializes as a two-parent merge commit even when the PR is fully fast-forwardable. From the run log:

```
fetch ... +e8b57f8...:refs/remotes/pull/129/merge
checkout refs/remotes/pull/129/merge
HEAD is now at e8b57f8 Merge 9d1897f... into 69a008...
```

The model inspects ``git log`` on the runner, sees a ``Merge X into Y`` commit on top, and concludes \"not rebased.\" This fires on every rebased PR, regardless of whether ``main`` actually moved underneath the branch. (Was intermittent on the next run for #131 — model output is non-deterministic.)

This is a **different mechanism** from #125's premise (\"the model can't see the branch name\"). Even if PR metadata is passed in, the model is still reading the synthetic merge commit from ``git log`` on the checked-out tree.

**Detail:** [PR #129 comment](https://github.com/rafacm/ragtime/pull/129#issuecomment-4371503376)

### 2. `Comment Discipline` reads diff-removed lines as still present (#131)

**Verdict (model):**
> There are several comments in the ``episodes/apps.py`` file that paraphrase what the code does […] For example, comments like ``# Only initialize DBOS for server/worker entrypoints, plus the management commands that need to talk to the queue.`` and ``# Decorators must register before DBOS.launch(); enqueues against late-registered workflows silently fail.`` can be removed because the functionality is clear from the code itself.

**Ground truth:** The first comment cited (``# Only initialize DBOS for server/worker entrypoints…``) was **deleted** by this PR — it appears in the unified diff as a ``-`` line. The model is treating ``-`` lines as if they're still present in the file. The second comment is a legitimate non-obvious WHY (silent-failure mode of late-registered DBOS workflows) that the agent kept deliberately per the rule's own \"keep WHY\" carve-out.

The driver currently passes ``git diff $BASE_REF…HEAD`` directly to the model. There's no signal that ``-`` means \"removed\" beyond the standard unified-diff convention, which gpt-4o-mini regularly mishandles on multi-hunk diffs.

**Detail:** verdict artifact [Job 74235387227](https://github.com/rafacm/ragtime/actions/runs/25322667606/job/74235387227)

### 3. `RAGTIME_* Env Var Sync` invents env var names from PR title (#133)

**Verdict artifact:**

```json
{
  \"verdict\": \"fail\",
  \"summary\": \"New environment variables were added but .env.sample and configure.py are not updated accordingly.\",
  \"details\": \"The code introduces the variable RAGTIME_SHOW_NAME, which is not referenced in .env.sample or core/management/commands/configure.py.\"
}
```

**Ground truth:**

```
$ git grep RAGTIME_SHOW_NAME origin/rafacm/download-show-name-fix
(no output)

$ git diff origin/main...origin/rafacm/download-show-name-fix | grep RAGTIME_
     @override_settings(RAGTIME_PODCAST_AGGREGATORS=\"\")
```

``RAGTIME_SHOW_NAME`` does not exist anywhere in the branch. The PR adds ``Episode.show_name`` (a Django model field). The model invented an env var name by combining the rule's ``RAGTIME_`` prefix with the PR's branch/title token ``show_name``. Title-priming hallucination — the same mechanism that produced the #122-era ``Feature PR Docs`` confabulation, but on a different rule.

**Detail:** [PR #133 comment](https://github.com/rafacm/ragtime/pull/133#issuecomment-4371633524) and [verdict artifact](https://github.com/rafacm/ragtime/actions/runs/25322692321/artifacts/6784929974)

### 4. `Pipeline Step Documentation Sync` misreads rule scope and claims doc isn't updated when it is (#133, post-rebase)

**Verdict (model):**
> The diff adds a new field `show_name` to the `Episode` model in `episodes/models.py` and modifies the download process, but it does not update the documentation in `README.md` or `doc/README.md` to reflect these changes.

**Two problems, layered:**

1. **Diff-reading miss.** ``doc/README.md`` *is* updated in the PR — `show_name` is added to the `details` field list and the overwrite-semantics paragraph gains an additive-only exception. The model claims the file isn't touched. This is the inverse of pattern #2: instead of reading ``-`` lines as still-present, it misses ``+`` lines that *are* present.
2. **Rule-scope misread.** Even if ``README.md`` lacked any mention of `show_name`, the rule wouldn't require it. AGENTS.md scopes the rule to *"When adding, removing, or changing a pipeline step, update the summary table in `README.md` and the detailed step descriptions in `doc/README.md`."* This PR doesn't add/remove/change a pipeline step — the steps list is unchanged; only the *behavior inside* the existing fetch_details step changed. The README summary table at the step-level granularity (`"🕷️ Fetch Details ... extracts metadata"`) is correct and shouldn't enumerate every field. The model expanded "changing a step" to mean "any change touching a step's source file."

This is a **scope-creep + diff-miss double hallucination**. The driver-side fix that addresses pattern #2 (extracting added vs. removed hunks before handing to the model) helps here too, but the rule-scope-misread component is independent — better-prompted rule guidance or a stronger model is needed for that half.

**Detail:** [PR #133 — Pipeline Step Documentation Sync verdict](https://github.com/rafacm/ragtime/actions/runs/25325801357/job/74246212968)

## Why this is separate from #125

#125's proposal is to **pass more context** into the prompt (PR metadata, model name in summary, telemetry). That's strictly additive — none of it tells the model \"do not invent file content that isn't in the diff.\" The patterns above are about the model fabricating or misreading the input it already has:

- #1 reads the wrong commit (driver-side fix: change ``actions/checkout`` ref).
- #2 misreads the diff format — treats ``-`` lines as still present (driver-side fix: pre-process the diff into added/removed sections, or use a stronger model).
- #3 invents from priming (driver-side fix: deterministic env-var precheck before model call, or stronger model).
- #4 misreads rule scope and misses ``+`` lines that *are* in the diff (driver-side fix: same diff pre-processing as #2 + a per-rule \"applicability check\" precheck before scoring; stronger model also helps).

All four share \"the verdict references content not present in the PR\" or \"the verdict treats content as absent that is present.\" Worth a coordinated bundle.

## Why we're not just upgrading the default model

Reasonable counter-question: if all three failures look like small-model hallucinations, why not just bump the default from ``openai/gpt-4o-mini`` to a Sonnet-class model and call it a day? Per-pattern susceptibility:

| Pattern | What the model did wrong | Stronger-model fix likelihood |
|---|---|---|
| #1 synthetic merge commit | Read ``git log`` and didn't realize HEAD was a synthetic GHA merge ref, not the PR head | **Medium** — GPT-4o / Sonnet 4.6 often recognize ``refs/pull/N/merge`` semantics, but it's an environmental ambiguity even strong models can trip on. Driver fix (proposal A) eliminates regardless of model. |
| #2 diff ``-`` lines | Treated removed lines as if still present | **High** — reading unified-diff ``+``/``-`` correctly is basic comprehension; ``gpt-4o-mini`` specifically gets sloppy on multi-hunk diffs. |
| #3 invented env var | Combined the rule's ``RAGTIME_`` prefix with the PR title token ``show_name`` and produced a fact | **High** — classic small-model under-grounding-in-input + over-grounding-in-surface-tokens. |
| #4 missed ``+`` lines + scope-creep | Claimed ``doc/README.md`` wasn't updated when the diff clearly adds lines to it; also stretched \"changing a pipeline step\" to mean \"any change touching a step's source file\" | **High** for the diff-miss half; **Medium** for the scope-creep half (rule prompt could be tightened independently of model strength). |

So **3 of 4 are model-capacity issues** a Sonnet-class model would almost certainly avoid; #1 is a mixed environmental + model issue that proposal A fixes regardless of model.

Why not wholesale-upgrade anyway? **Cost.** Sonnet 4.6 is ~10–20× the per-PR cost of ``gpt-4o-mini``, multiplied across 9 rules × every PR. The framing of the proposals below preserves the cheap default for the routine pass case:

- **Proposal B** (deterministic prechecks) makes 2 of 3 patterns disappear without spending **any** model tokens — pure regex on the diff.
- **Proposal C** (auto-retry on stronger model when first verdict is ``fail``) bounds the cost: pay Sonnet rates only when there's a verdict worth double-checking. False-positive rate drops, true-negative rate stays cheap.
- **Proposal D** (telemetry on overturned verdicts) gives us the data to revisit the default-model choice in 3–6 months with evidence rather than vibes.

If telemetry shows the cheap default has an unacceptable false-positive rate even after B and C land, that's the moment to bump the default — not now.

## Proposal

### A. Pin ``actions/checkout`` to the PR head SHA (fixes #1)

Set ``ref: \${{ github.event.pull_request.head.sha }}`` on the ``check`` job's ``actions/checkout`` step. The runner will then check out ``9d1897f`` directly (the rebased head) instead of the synthetic ``refs/pull/N/merge`` commit. ``git log`` on the runner reflects the PR's actual linear history.

Caveat: the diff is still computed via ``git diff \$BASE_REF…HEAD``, where ``\$BASE_REF`` is fetched separately — this part is unaffected.

### B. Deterministic precheck for path-/symbol-based rules (fixes #2 and #3, plus extends #125's Option B)

For rules whose verdict rests on \"does symbol/path X appear in the diff?\", do the check in Python before the model call:

| Rule | Precheck |
|---|---|
| ``ragtime-env-var-sync`` | Regex ``\b(RAGTIME_[A-Z_]+)\b`` against added lines in the diff; pass the resulting set to the model as ``Detected new RAGTIME_* references: {…}`` (empty set ⇒ verdict ``pass`` deterministically, no model call needed). |
| ``feature-pr-docs`` | Already covered by #125 Option B. |
| ``comment-discipline`` | Extract added comment lines (``^\+\s*#``) from the diff, hand the model **only** those lines as a list. Strips diff-format ambiguity entirely. |
| ``pipeline-step-doc-sync`` | Two-part precheck: (a) detect whether the PR actually adds/removes/changes a pipeline step (regex on ``_PIPELINE_DISPATCH``, ``Episode.Status`` choices, or new ``@DBOS.step`` decorators) — if not, deterministic ``pass`` with summary \"PR does not modify the pipeline-step list.\" (b) If a step *is* modified, pre-extract whether ``README.md`` and ``doc/README.md`` appear in the diff's changed-files list and pass that fact to the model so it can't claim a touched file is untouched. |

This is more invasive than #125 Option B because it touches four rules instead of one, but each is mechanical (one regex per rule). Worth doing as one bundle.

### C. Fall back to a stronger model for any rule that fails (optional)

If a rule's first-pass verdict is ``fail`` with the cheap default (``openai/gpt-4o-mini``), automatically retry with ``openai/gpt-4o`` or ``anthropic/claude-sonnet-4-6``. If the second model also says ``fail``, post that verdict; if it says ``pass``, the original was a hallucination — log it and pass.

Cost is bounded: only failures retry, so true negatives stay cheap. Helps catch any hallucination pattern not specifically deterministic-precheckable.

### D. Telemetry to detect ongoing false-positive rate

Already partially covered by #125 (\"surface which model was used\"). Add: when a verdict is overturned by either #C or by a maintainer comment, log the rule + model + verdict for offline review. Six months from now, this is the data we'd need to decide whether to keep the cheap-model default.

## Acceptance criteria

- [ ] Open a fresh rebased PR with no real rule violations. ``Branching & PR Strategy`` passes deterministically.
- [ ] PR that deletes a comment (no comment additions) passes ``Comment Discipline`` deterministically.
- [ ] PR that touches a model field whose name happens to match a ``RAGTIME_*`` pattern (e.g. a hypothetical ``Episode.api_key`` field) passes ``RAGTIME_* Env Var Sync`` deterministically.
- [ ] PR that touches a pipeline-step file but doesn't add/remove/change a step in ``_PIPELINE_DISPATCH`` passes ``Pipeline Step Documentation Sync`` deterministically.
- [ ] PR that changes a pipeline step **and** updates only ``doc/README.md`` (not ``README.md``) is correctly judged based on whether the README summary table needs updating — i.e., the model sees the actual diff, not a hallucinated absence.
- [ ] No regressions on the substantive rules that already pass correctly (validate against the #123 planted-violations PR).

## Out of scope

- Replacing the matrix-of-jobs UX with the Checks API — that's #135.
- Building a full eval harness — that's #115.
- The metadata-passing / model-name-in-summary work in #125 — orthogonal and complementary.

## Depends on / relates to

- #122 — original runner.
- #123 — planted-violations validation; the fixes here must pass it.
- #125 — driver-hardening bundle for metadata + telemetry. This issue is the \"hallucination-mitigation\" sibling.
- #126 — standalone runner extraction.
- #135 — Checks API rendering.



Pattern	What the model did wrong	Stronger-model fix likelihood
#1 synthetic merge commit	Read `git log` and didn't realize HEAD was a synthetic GHA merge ref, not the PR head	Medium — GPT-4o / Sonnet 4.6 often recognize `refs/pull/N/merge` semantics, but it's an environmental ambiguity even strong models can trip on. Driver fix (proposal A) eliminates regardless of model.
#2 diff `-` lines	Treated removed lines as if still present	High — reading unified-diff `+`/`-` correctly is basic comprehension; `gpt-4o-mini` specifically gets sloppy on multi-hunk diffs.
#3 invented env var	Combined the rule's `RAGTIME_` prefix with the PR title token `show_name` and produced a fact	High — classic small-model under-grounding-in-input + over-grounding-in-surface-tokens.
#4 missed `+` lines + scope-creep	Claimed `doc/README.md` wasn't updated when the diff clearly adds lines to it; also stretched "changing a pipeline step" to mean "any change touching a step's source file"	High for the diff-miss half; Medium for the scope-creep half (rule prompt could be tightened independently of model strength).

Rule	Precheck
`ragtime-env-var-sync`	Regex `\b(RAGTIME_[A-Z_]+)\b` against added lines in the diff; pass the resulting set to the model as `Detected new RAGTIME_* references: {…}` (empty set ⇒ verdict `pass` deterministically, no model call needed).
`feature-pr-docs`	Already covered by #125 Option B.
`comment-discipline`	Extract added comment lines (`^\+\s#`) from the diff, hand the model only* those lines as a list. Strips diff-format ambiguity entirely.
`pipeline-step-doc-sync`	Two-part precheck: (a) detect whether the PR actually adds/removes/changes a pipeline step (regex on `_PIPELINE_DISPATCH`, `Episode.Status` choices, or new `@DBOS.step` decorators) — if not, deterministic `pass` with summary "PR does not modify the pipeline-step list." (b) If a step is modified, pre-extract whether `README.md` and `doc/README.md` appear in the diff's changed-files list and pass that fact to the model so it can't claim a touched file is untouched.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI checks driver: hallucination patterns not covered by #125 (4 reproducers) #137

Background

Reproducers

1. `Branching & PR Strategy` reads the synthetic merge commit (#129)

2. `Comment Discipline` reads diff-removed lines as still present (#131)

3. `RAGTIME_* Env Var Sync` invents env var names from PR title (#133)

4. `Pipeline Step Documentation Sync` misreads rule scope and claims doc isn't updated when it is (#133, post-rebase)

Why this is separate from #125

Why we're not just upgrading the default model

Proposal

A. Pin `actions/checkout` to the PR head SHA (fixes #1)

B. Deterministic precheck for path-/symbol-based rules (fixes #2 and #3, plus extends #125's Option B)

C. Fall back to a stronger model for any rule that fails (optional)

D. Telemetry to detect ongoing false-positive rate

Acceptance criteria

Out of scope

Depends on / relates to

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

AI checks driver: hallucination patterns not covered by #125 (4 reproducers) #137

Description

Background

Reproducers

1. Branching & PR Strategy reads the synthetic merge commit (#129)

2. Comment Discipline reads diff-removed lines as still present (#131)

3. RAGTIME_* Env Var Sync invents env var names from PR title (#133)

4. Pipeline Step Documentation Sync misreads rule scope and claims doc isn't updated when it is (#133, post-rebase)

Why this is separate from #125

Why we're not just upgrading the default model

Proposal

A. Pin actions/checkout to the PR head SHA (fixes #1)

B. Deterministic precheck for path-/symbol-based rules (fixes #2 and #3, plus extends #125's Option B)

C. Fall back to a stronger model for any rule that fails (optional)

D. Telemetry to detect ongoing false-positive rate

Acceptance criteria

Out of scope

Depends on / relates to

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `Branching & PR Strategy` reads the synthetic merge commit (#129)

2. `Comment Discipline` reads diff-removed lines as still present (#131)

3. `RAGTIME_* Env Var Sync` invents env var names from PR title (#133)

4. `Pipeline Step Documentation Sync` misreads rule scope and claims doc isn't updated when it is (#133, post-rebase)

A. Pin `actions/checkout` to the PR head SHA (fixes #1)