feat(format): gated LLM secondary formatter with forbidden-token enforcement by Mathews-Tom · Pull Request #6 · Mathews-Tom/augur

Mathews-Tom · 2026-04-17T08:59:30Z

Summary

Ships the gated, opt-in LLM formatter that consumes SignalContext and emits IntelligenceBrief through a five-stage defense pipeline: deterministic prompt builder, backend completion (Ollama or Anthropic), forbidden-token linter, schema validator, provenance stamp. The interpreter suspends during storm mode; consumers without accepts_llm_assisted = true never see LLM briefs. The deterministic pipeline runs regardless.

What Changed

Contract: expanded IntelligenceBrief with headline ≤ 90 chars, body ≤ 800 chars, formatter_version, generated_at, and two model validators locking interpretation_mode and forbidden_token_check to their Literal singletons.
Backends: AbstractLLMBackend protocol + OllamaBackend (plain httpx) + AnthropicBackend (lazy-imported SDK).
Prompts: deterministic PromptBuilder with _system.txt and five per-signal-type templates.
Linter: ForbiddenTokenLinter with load_forbidden_phrases that flattens every category block in config/forbidden_tokens.toml; case-insensitive matching.
Schema: SchemaValidator wrapping Pydantic with a stable ValidationResult shape.
Provenance: SHA-256 prompt hash + backend-qualified model + installed formatter_version.
Gate: ConsumerGate filters consumers by accepts_llm_assisted opt-in.
Orchestrator: LLMInterpreter composes all five stages; set_suspended wires into StormController.
Config: config/llm.toml (default off, Ollama + Anthropic blocks, template path).

How It Works

The interpreter builds a deterministic prompt via PromptBuilder, calls AbstractLLMBackend.complete, runs the output through ForbiddenTokenLinter.check_text, parses JSON, stamps provenance via stamp(...), and validates against the IntelligenceBrief contract via SchemaValidator. Any failure returns None; the deterministic formatters emit unaffected. Storm mode short-circuits before the backend call so the 5-10-second per-brief latency does not starve the dedup pipeline under pressure.

Configuration Added

config/llm.toml — enabled=false default, backend configs, prompt template path. Opt-in required.

Schema Changes

schemas/IntelligenceBrief-1.0.0.json re-exported with the new fields (length bounds, formatter_version, generated_at). Same version, backward-compatible additions within the 1.0 series.

Quality Gates

Definition of Done

Operational Handoff

After merge an operator enables the formatter by setting config/llm.toml [interpreter] enabled = true, installing the chosen backend extras (augur-format[llm-local] for Ollama, augur-format[llm-cloud] for Anthropic), and provisioning credentials. The deterministic JSON and Markdown pathways reach every consumer regardless; the LLM-assisted brief reaches only consumers whose accepts_llm_assisted = true.

Test Plan

uv run pytest passes locally (273 tests).
uv run python scripts/export_schemas.py --check passes locally.
uv run pre-commit run --all-files passes locally.
CI workflow runs green on this PR.
Operational: generate a real brief against a local Gemma 2 daemon and confirm the forbidden-token linter rejects a taint-test response.

Review Pass

pr-review findings addressed: 2 HIGH + 3 MEDIUM fixed in 4074957 fix(llm).
code-refiner simplifications applied: the schema-version constant (M3) and double-validation collapse (M4) landed in the same commit.

Deferred Findings

LOW L1 (ConsumerGate.is_eligible unused brief parameter): kept for future per-brief policy (severity-gated opt-in, etc.) so the signature does not break downstream when that lands.
LOW L2 (templates near-identical): phase-4 §12 calls out per-type divergence as future work; consolidation follows when detector-specific content diverges.
LOW L3 (builder re-reads per-signal templates every build): negligible latency vs the backend call; revisit if profiling shows it.
LOW L4 (Anthropic health_check): the SDK does not expose a cheap ping endpoint. The lazy construction validates credentials; first real completion surfaces any model-availability issues. Deferred until the SDK adds a cheaper probe.

The interpreter is enabled=false by default per phase-4 §17.1. An operator opts in by editing config/llm.toml after reviewing the reputation-risk example in docs/examples/negative-paths.md §Example 4. backends.ollama defaults to the local daemon at 11434 with the gemma2 27B model; backends.anthropic uses claude-haiku-4-5-20251001 per the project model conventions in ~/.claude/CLAUDE.md §Models. Model identifiers live only in this file; source code reads them at startup. suspend_during_storm wires into the dedup layer's StormController so the LLM formatter stops generating when the bus enters storm mode, preserving the deterministic pipeline's throughput under pressure.

…nd timestamps IntelligenceBrief gains three load-bearing fields per phase-4 §3 and two constructor-time validators that lock the brief's interpretation mode and forbidden-token check to their Literal singletons. headline is capped at 90 characters so it fits a Slack header; body_markdown is capped at 800 characters so it renders cleanly on a dashboard card; actionable_for is typed list[ConsumerType] so unknown consumers fail at construction. formatter_version and generated_at let consumers verify which formatter produced the brief and when, closing the provenance surface that prompt_hash and model alone could not. Two model_validator decorators enforce the Literal singletons: interpretation_mode must equal "llm_assisted" and forbidden_token_check must equal "passed". The gated formatter path is the only code that can mint a conforming brief because any other construction path would have to forge those literals, which code review catches. schemas/IntelligenceBrief-1.0.0.json is regenerated via scripts/export_schemas.py so the wire contract matches the model.

AbstractLLMBackend is the Protocol the interpreter dispatches through. Two concrete adapters implement the same async ``complete`` surface: OllamaBackend routes through the local daemon via plain httpx (no hard dependency on the ollama SDK), AnthropicBackend uses the anthropic SDK lazily imported inside the constructor so the llm-isolation test in the default environment still passes. Both adapters retry on transient failures: Ollama retries twice with no backoff (local daemon outages should surface quickly, not loop for a minute), Anthropic retries up to the configured limit per phase-4 §4.4. A backend that exhausts retries raises BackendError; the interpreter treats the error as a dropped brief per phase-4 §10 rather than propagating. AnthropicBackend accepts an injected client for testing; production code constructs the client from the ANTHROPIC_API_KEY env var. Missing credentials plus no injected client fails loud at construction. CompletionResult captures text, token counts, and generation duration so the observability hooks in the interpreter can surface per-backend latency distributions. Six tests cover Ollama health-check success, Ollama completion parse path, Ollama retry-exhaustion, Anthropic credential enforcement, Anthropic injected-client happy path, and Anthropic retry exhaustion.

The prompt builder produces a deterministic (system, user) pair from any SignalContext. The system message embeds the sorted forbidden- phrase list, a summary of the IntelligenceBrief schema, and the ConsumerType enum — ensuring the model sees the exact constraints it must satisfy. The user message renders the signal payload into the per-signal-type template; all five SignalType values have a dedicated template under augur_format/llm/prompts/templates/. Determinism is the load-bearing contract: identical input plus identical template files always produce identical prompt strings. The prompt hash attached to every brief is SHA-256 of the concatenated pair, so auditors can reproduce the prompt offline from the SignalContext and confirm the model saw exactly what the builder claims it saw. Missing templates raise PromptTemplateNotFoundError at render time rather than silently falling back — contract drift between SignalType enum and template directory fails loud. The hatch build config now also includes *.txt so the templates ship with the wheel alongside the Markdown Jinja2 templates from the deterministic pathway. Nine tests cover determinism across calls, system-message phrase and consumer-enum injection, verbatim resolution-criteria pass-through, manipulation-flag rendering both populated and empty, every signal type finding its template, related-market bullet rendering, and the missing-template error path.

…, consumer gate Four defense layers sit between the backend's raw text and a persisted IntelligenceBrief: ForbiddenTokenLinter case-insensitive matches every phrase loaded from config/forbidden_tokens.toml (causal_narrative, price_projection, manipulation_speculation). A match drops the brief before IntelligenceBrief construction. load_forbidden_phrases flattens every [category].phrases block into a single list so the linter does not need to know category semantics. SchemaValidator wraps Pydantic's IntelligenceBrief.model_validate and translates ValidationError into a stable ValidationResult. The interpreter checks result.ok before minting a brief; any schema violation drops the brief and logs the offending field path. ProvenanceStamp holds model (backend-qualified), prompt_hash (SHA-256 of system + "\n\n" + user), and formatter_version (from installed package metadata). Auditors reproduce prompt_hash from the deterministic prompt builder to confirm the model saw exactly what the record claims. ConsumerGate enforces the docs/contracts/consumer-registry.md opt-in rule: only consumers with accepts_llm_assisted=true receive the LLM brief. The deterministic JSON and Markdown paths still reach every consumer; the gate only filters the secondary formatter's output. Eleven tests cover: every configured phrase rejected, case insensitivity, clean text accepted, brief-shape lint, schema validator accept + two rejection modes, stamp reproducibility, stamp hash varies on prompt change, gate eligibility both directions, and list filtering.

LLMInterpreter is the single entrypoint the engine calls to render a SignalContext into an IntelligenceBrief through the gated path. The orchestrator sequences five stages: build deterministic prompt, call backend, lint output for forbidden tokens, validate against the IntelligenceBrief schema, stamp provenance. Any failure at any stage drops the brief by returning None — the deterministic pipeline proceeds unaffected, so consumers always receive the canonical JSON and Markdown outputs regardless of LLM outcome. set_suspended wires into the Phase-1 StormController's state stream per phase-4 §11: when in_storm=True the interpreter returns None immediately without calling the backend, avoiding the 5-10-second per-brief latency under storm-mode pressure. Briefs that would have been generated during suspension are not retroactively rendered. Provenance stamping attaches model identifier (backend-qualified), SHA-256 prompt hash, and formatter version to every brief. Auditors reproduce the hash from the prompt builder's deterministic output and confirm the model saw exactly what the record claims. now is a parameter so backtest harnesses can drive generated_at deterministically. Production code passes None which falls through to datetime.now(UTC); tests always pass an explicit timestamp. Eight tests cover the full pipeline: happy path, forbidden token drop, invalid JSON drop, unknown consumer drop, backend error drop, storm-mode suspension short-circuit, resume after suspension, and over-length-headline schema drop.

…onstant Addresses the pr-review findings in order: HIGH (H1): LLMInterpreter now accepts an optional ConsumerGate and filters each brief's actionable_for to the opted-in subset before returning. Briefs whose actionable_for empties after filtering drop entirely — the previous wiring generated a brief whose consumer list was never validated against the accepts_llm_assisted registry, letting LLM output leak to agent consumers that had not opted in. When the filter trims the list, the brief is rebuilt via model_copy so downstream code sees only the allowed set. HIGH (H2): the forbidden-token linter now runs against the post-parse headline+body instead of the raw JSON response. A model that escapes a forbidden phrase as \\u006d\\u0061\\u0079 would slip past the substring check on raw JSON but fails the lint after json.loads normalizes the escape. A regression test covers the unicode-escape bypass path. MEDIUM (M3): models.py exports SCHEMA_VERSION as a module-level constant and the interpreter plus prompt builder read from it. A schema version bump now requires one edit instead of three. MEDIUM (M4): interpreter drops the SchemaValidator wrapper's double validation; IntelligenceBrief.model_validate is the single source of schema truth. ValidationError drops the brief without a second full-validate pass. MEDIUM (M1): OllamaBackend raises immediately on 4xx responses (malformed adapter payload) instead of retrying — the error class only recovers on 5xx/connection failures. MEDIUM (M2): AnthropicBackend narrows retry to transient failures. AuthenticationError, PermissionDeniedError, and BadRequestError class paths raise through a wrapped BackendError immediately so credential misconfigurations surface without burning the retry budget. Class lookup is string-based so the module loads without the anthropic SDK installed. Three new tests cover the consumer-gate filter path, the no- consumer-opted-in drop path, and the unicode-escape lint bypass.

Mathews-Tom added 8 commits April 17, 2026 14:16

docs: record gated llm secondary formatter in the changelog

c2e7a9b

Mathews-Tom merged commit 6e27d0e into main Apr 17, 2026
2 checks passed

Mathews-Tom deleted the feat/llm-secondary-formatter branch April 17, 2026 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(format): gated LLM secondary formatter with forbidden-token enforcement#6

feat(format): gated LLM secondary formatter with forbidden-token enforcement#6
Mathews-Tom merged 8 commits into
mainfrom
feat/llm-secondary-formatter

Mathews-Tom commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mathews-Tom commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

How It Works

Configuration Added

Schema Changes

Quality Gates

Definition of Done

Operational Handoff

Test Plan

Review Pass

Deferred Findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mathews-Tom commented Apr 17, 2026 •

edited

Loading