Skip to content

feat(cost-telemetry): wire news event extraction + lib v0.32→v0.33 (Phase 0.2)#308

Merged
cipher813 merged 2 commits into
mainfrom
feat/cost-telemetry-event-extraction-v0.33
May 25, 2026
Merged

feat(cost-telemetry): wire news event extraction + lib v0.32→v0.33 (Phase 0.2)#308
cipher813 merged 2 commits into
mainfrom
feat/cost-telemetry-event-extraction-v0.33

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Closes the largest previously-untracked LLM cost slice in the system (~$20–60/mo, the dominant ~20–60% of monthly Anthropic spend per the Phase 0 audit §1.1). News-article event extraction in `collectors/nlp/event_extraction.py:167` fires 100–300 Haiku calls per RAGIngestion run; this PR records every response into a per-run JSONL flushed once to S3.

Zero-coupling pattern

New `rag/pipelines/_cost_telemetry.py` provides `wrap_client_for_cost_telemetry(client, buffer)` — proxy that records every `messages.create()` response into a buffer without changing the response shape. `AnthropicEventExtractor` is UNCHANGED — telemetry composes at the client-construction layer in `_run_nlp` rather than polluting the extractor.

Single S3 chokepoint

Rows land at `s3://alpha-engine-research/decision_artifacts/_cost_raw/{date}/{date}/data:news_event_extraction.jsonl` — same partition the research-side `aggregate_costs.py` already scans. Data's rows show up in the daily parquet's `by_agent_id` breakdown alongside research's; the Phase 0.3 dashboard cost panel surfaces every site in one view.

Buffered + flushed once

100–300 calls per run → 1 PutObject at end-of-pipeline. Per-call PutObjects would be wasteful.

Failure semantics

  • Flush failure raises `CostBufferFlushError` and fails the pipeline (per `[[feedback_no_silent_fails]]` — silent miss on the dominant cost slice would defeat Phase 0 visibility)
  • Per-call recording failure (malformed response) is logged but does NOT propagate (event extractor's primary deliverable must survive a telemetry hiccup)

Lib pin

`v0.32.0 → v0.33.0` in requirements.txt + Dockerfile to consume the lifted `alpha_engine_lib.cost.record_anthropic_call` (alpha-engine-lib #69 — data is consumer #2 of the SOTA chokepoint after morning-signal originated the pattern).

Test plan

  • `pytest tests/test_news_cost_telemetry.py` — 9 new tests (buffer record/flush/empty/error paths, proxy passthrough, per-call failure isolation, factory naming)
  • Full suite 1480 → 1489 passing, zero regressions
  • Lib v0.33.0 install resolved cleanly in local venv
  • Sat 5/30 RAGIngestion fires + JSONL lands at the canonical key + `aggregate_costs.py` picks it up

🤖 Generated with Claude Code

cipher813 and others added 2 commits May 25, 2026 10:24
…hase 0.2)

Closes the largest previously-untracked LLM cost slice in the system
(~$20–60/mo, the dominant ~20–60% of monthly Anthropic spend per the
Phase 0 audit at ``alpha-engine-docs/private/prompt-caching-investigation-260525.md``
§1.1). News-article event extraction in
``collectors/nlp/event_extraction.py:167`` fires 100–300 Haiku calls
per RAGIngestion run; this PR records every response's tokens +
server-tool fees into a per-run JSONL flushed once to S3 at the
canonical research-side cost-raw partition.

**Zero-coupling pattern:** new ``rag/pipelines/_cost_telemetry.py``
provides ``wrap_client_for_cost_telemetry(client, buffer)`` — proxy
that records every ``messages.create()`` response into a buffer
without changing the response shape returned. ``AnthropicEventExtractor``
is UNCHANGED — telemetry composes at the client-construction layer in
``_run_nlp`` rather than polluting the extractor with cost-tracking
concerns.

**Single S3 chokepoint:** rows land at
``s3://alpha-engine-research/decision_artifacts/_cost_raw/{date}/{date}/data:news_event_extraction.jsonl``
— same partition the research-side ``aggregate_costs.py`` already
scans. The daily parquet now sums data's rows alongside research's
under one ``by_agent_id`` breakdown; the dashboard cost panel
(Phase 0.3) will show every site in one view.

**Buffered + flushed once** rather than per-call to keep S3 PutObject
volume sane: 100–300 calls per run → 1 PutObject at end-of-pipeline.

**Fail-loud at flush** per ``[[feedback_no_silent_fails]]``: S3
PutObject failure raises ``CostBufferFlushError`` and fails the
pipeline — silent miss on the dominant cost slice would defeat the
Phase 0 visibility goal. Per-call recording failures (malformed
response shape) are logged but do NOT propagate — the event extractor's
primary deliverable must survive a cost-telemetry hiccup.

**Lib pin v0.32.0 → v0.33.0** in requirements.txt + Dockerfile to
consume the lifted ``alpha_engine_lib.cost.record_anthropic_call``
(alpha-engine-lib #69, the SOTA chokepoint that data + executor are
consumers #2 + #3 of after morning-signal originated the pattern).

**Tests:** new ``tests/test_news_cost_telemetry.py`` covers buffer
record/flush/empty/error paths + proxy passthrough + per-call
recording failure isolation + factory naming convention. 9 new tests,
suite 1480 → 1489 passing, zero regressions.

Closes ROADMAP Phase 0.2 row "News event extraction (data, $20–60/mo,
dominant slice)" — first production exercise will land on the next
Saturday SF (2026-05-30) when RAGIngestion fires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI failure on PR #308: ``ModuleNotFoundError: No module named 'moto'``
— this repo's CI installs only ``requirements.txt + pytest`` (no
dev-extras file), and the rest of the suite avoids moto by using a
minimal in-memory S3 mock (``_InMemoryS3`` in
``tests/test_news_aggregates.py``). Mirror that pattern here so the
cost-telemetry tests run cleanly in the unmodified CI environment.

Functional coverage unchanged — all 9 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 2f9d50a into main May 25, 2026
1 check passed
@cipher813 cipher813 deleted the feat/cost-telemetry-event-extraction-v0.33 branch May 25, 2026 17:50
cipher813 added a commit that referenced this pull request May 25, 2026
…Phase 4 #1) (#309)

Mirrors alpha-engine-research's ``llm_cost_tracker.RunBudgetExceededError``
pattern at the news-pipeline cost site (Phase 0.2 wiring, PR #308):
``S3CostBuffer.record()`` now tracks cumulative cost across the run and
raises ``CostBudgetExceededError`` after recording the offending row if
the per-run total exceeds ``ALPHA_ENGINE_RUN_BUDGET_USD`` (default $100,
shared env var with research + executor — one operator knob ceilings
cost across all SF entry points).

**Failure shape:** row is recorded BEFORE the raise so per-call detail
is preserved on S3 when the breaker fires. The pipeline-side
try/finally in ``run_news_pipeline.py`` ensures the buffer flushes
even when the breaker raises mid-loop — rows up to and including the
breach call land on S3 so operators can diagnose what broke the budget
without re-running.

**Posture:** breaker propagates through the client proxy
(``_CostTrackingMessages.create``) — generic record errors still get
swallowed (event extraction's primary deliverable must survive a
malformed-response hiccup), but ``CostBudgetExceededError`` is
explicitly re-raised since swallowing it would defeat the safety net.

**Operator-facing fields on the error:** ``run_id``, ``agent_id``,
``cumulative_cost_usd``, ``ceiling_usd``. Message tells operator how
to adjust the env var if the ceiling needs raising.

**Tests:** 4 ``TestRunBudgetCeilingResolution`` (default / env / zero
disables / malformed-returns-zero) + 5 ``TestCostBudgetBreaker``
(under-ceiling no raise, breach raises after recording row, zero
disables, proxy propagates, ceiling defaults from env). Suite 1484 →
1493 passing, zero regressions.

**Composes with** PR #308 (the Phase 0.2 wiring) — the breaker is a
small additive surface on the buffer; the production wiring path
unchanged.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 25, 2026
…or (#310)

Applies the standing rule per ``[[preference_llm_calls_confined_to_research_module]]``
— LLM calls live in alpha-engine-research. The news pipeline's
Haiku-backed event extractor is removed and replaced with a deterministic
classifier that uses two zero-cost signals already on the wire:

1. **Vendor tags** (``NewsArticle.tags``). Polygon emits keywords,
   GDELT emits structured event codes, Benzinga emits Channels. The
   ``alpha_engine_lib.sources.protocols.NewsArticle.tags`` docstring
   explicitly names this as "a soft signal for downstream event-flag
   extraction" — we were paying Haiku to re-derive what Polygon /
   GDELT already tagged.
2. **Title-keyword regex**. Backstop for sources that don't populate
   tags (Yahoo RSS). 17 pattern → category mappings against the
   ``DEFAULT_EVENT_CATEGORIES`` closed taxonomy.

**Why this is the right answer, not a kill-switch:** code audit found
the Haiku per-article structured output was aggregated to 5 scalar /
list columns (``event_count``, ``event_severity_max/mean``,
``event_categories``, ``top_event_descriptions``) before any research
consumer touched it. The "zero-shot novel-event detection" capability
was mostly wasted — research only sees per-ticker rollups. Tag-based +
keyword-based classification produces equivalent rollups
deterministically.

**Cost impact:** retires the largest previously-untracked LLM cost
slice in the system per the original Phase 0 audit estimate
($20–60/mo). Actual spend on the deleted call site goes to $0; the
research consumer sees identical EventFlag shape (extractor slug
changes from ``"anthropic_haiku"`` to ``"rule_based"``) and identical
aggregate columns in ``news_aggregates/{date}.parquet``.

**Substrate cleanup:** retires three files added earlier this session:
- ``collectors/nlp/event_extraction.py`` (the Anthropic extractor itself)
- ``rag/pipelines/_cost_telemetry.py`` (Phase 0.2 cost-telemetry buffer,
  PR #308 + Phase 4 #1 runaway-cost breaker, PR #309 — both retired
  with the LLM call site they instrumented)
- ``tests/test_news_cost_telemetry.py`` (mirrored tests)
``DEFAULT_EVENT_CATEGORIES`` moves into the new
``collectors/nlp/rule_based_event_extraction.py`` so the closed
taxonomy stays accessible to downstream consumers.

**Protocol contract:** ``EventExtractor.extract`` gains an optional
``article_tags: tuple[str, ...] = ()`` kwarg (back-compat default).
The pipeline plumbs the tag union across article variants. Any future
EventExtractor implementation (FinBERT, spaCy, reactivated LLM via
research module) consumes the same shape.

**Severity convention:** rule-based flags emit ``severity=0.5``
uniformly (the EventFlag protocol's documented default). The Haiku
severity was a free-floating judgment never tuned by any operator
alert. Per-category severity tuning can be added via YAML if a
downstream surface needs it.

**Tests:** ``TestRuleBasedEventExtractor`` (10 tests) covers
empty-text short-circuit, no-match returns empty, title-keyword
classification per category (earnings / M&A / FDA), tag-based
classification (Polygon/GDELT shape), tag+title union, multi-category
emission, deterministic ordering per ``DEFAULT_EVENT_CATEGORIES``,
zero-LLM-dependency contract, title-as-description shape. Suite
1493 → 1479 net (retired the 9 cost-telemetry tests + 7 Anthropic
extractor tests; added 10 rule-based tests).

**Composes with:**
- ``[[preference_llm_calls_confined_to_research_module]]`` — the
  rule this PR enforces
- alpha-engine #212 (executor EOD narrative kill switch) — sibling
  application of the same rule. Two non-research LLM call sites; this
  PR retires data's entirely, executor's keeps the kill switch
  substrate (default off) since the LLM path may be operator-reactivated.
- Retires the substrate from data #308 + data #309 (Phase 0.2 + Phase
  4 #1 cost-telemetry buffer + breaker) — both became dead code with
  the LLM call site they instrumented.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant