feat(cost-telemetry): wire news event extraction + lib v0.32→v0.33 (Phase 0.2) by cipher813 · Pull Request #308 · cipher813/alpha-engine-data

cipher813 · 2026-05-25T17:24:48Z

Summary

Closes the largest previously-untracked LLM cost slice in the system (~$20–60/mo, the dominant ~20–60% of monthly Anthropic spend per the Phase 0 audit §1.1). News-article event extraction in `collectors/nlp/event_extraction.py:167` fires 100–300 Haiku calls per RAGIngestion run; this PR records every response into a per-run JSONL flushed once to S3.

Zero-coupling pattern

New `rag/pipelines/_cost_telemetry.py` provides `wrap_client_for_cost_telemetry(client, buffer)` — proxy that records every `messages.create()` response into a buffer without changing the response shape. `AnthropicEventExtractor` is UNCHANGED — telemetry composes at the client-construction layer in `_run_nlp` rather than polluting the extractor.

Single S3 chokepoint

Rows land at `s3://alpha-engine-research/decision_artifacts/_cost_raw/{date}/{date}/data:news_event_extraction.jsonl` — same partition the research-side `aggregate_costs.py` already scans. Data's rows show up in the daily parquet's `by_agent_id` breakdown alongside research's; the Phase 0.3 dashboard cost panel surfaces every site in one view.

Buffered + flushed once

100–300 calls per run → 1 PutObject at end-of-pipeline. Per-call PutObjects would be wasteful.

Failure semantics

Flush failure raises `CostBufferFlushError` and fails the pipeline (per `[[feedback_no_silent_fails]]` — silent miss on the dominant cost slice would defeat Phase 0 visibility)
Per-call recording failure (malformed response) is logged but does NOT propagate (event extractor's primary deliverable must survive a telemetry hiccup)

Lib pin

`v0.32.0 → v0.33.0` in requirements.txt + Dockerfile to consume the lifted `alpha_engine_lib.cost.record_anthropic_call` (alpha-engine-lib #69 — data is consumer #2 of the SOTA chokepoint after morning-signal originated the pattern).

Test plan

`pytest tests/test_news_cost_telemetry.py` — 9 new tests (buffer record/flush/empty/error paths, proxy passthrough, per-call failure isolation, factory naming)
Full suite 1480 → 1489 passing, zero regressions
Lib v0.33.0 install resolved cleanly in local venv
Sat 5/30 RAGIngestion fires + JSONL lands at the canonical key + `aggregate_costs.py` picks it up

🤖 Generated with Claude Code

…hase 0.2) Closes the largest previously-untracked LLM cost slice in the system (~$20–60/mo, the dominant ~20–60% of monthly Anthropic spend per the Phase 0 audit at ``alpha-engine-docs/private/prompt-caching-investigation-260525.md`` §1.1). News-article event extraction in ``collectors/nlp/event_extraction.py:167`` fires 100–300 Haiku calls per RAGIngestion run; this PR records every response's tokens + server-tool fees into a per-run JSONL flushed once to S3 at the canonical research-side cost-raw partition. **Zero-coupling pattern:** new ``rag/pipelines/_cost_telemetry.py`` provides ``wrap_client_for_cost_telemetry(client, buffer)`` — proxy that records every ``messages.create()`` response into a buffer without changing the response shape returned. ``AnthropicEventExtractor`` is UNCHANGED — telemetry composes at the client-construction layer in ``_run_nlp`` rather than polluting the extractor with cost-tracking concerns. **Single S3 chokepoint:** rows land at ``s3://alpha-engine-research/decision_artifacts/_cost_raw/{date}/{date}/data:news_event_extraction.jsonl`` — same partition the research-side ``aggregate_costs.py`` already scans. The daily parquet now sums data's rows alongside research's under one ``by_agent_id`` breakdown; the dashboard cost panel (Phase 0.3) will show every site in one view. **Buffered + flushed once** rather than per-call to keep S3 PutObject volume sane: 100–300 calls per run → 1 PutObject at end-of-pipeline. **Fail-loud at flush** per ``[[feedback_no_silent_fails]]``: S3 PutObject failure raises ``CostBufferFlushError`` and fails the pipeline — silent miss on the dominant cost slice would defeat the Phase 0 visibility goal. Per-call recording failures (malformed response shape) are logged but do NOT propagate — the event extractor's primary deliverable must survive a cost-telemetry hiccup. **Lib pin v0.32.0 → v0.33.0** in requirements.txt + Dockerfile to consume the lifted ``alpha_engine_lib.cost.record_anthropic_call`` (alpha-engine-lib #69, the SOTA chokepoint that data + executor are consumers #2 + #3 of after morning-signal originated the pattern). **Tests:** new ``tests/test_news_cost_telemetry.py`` covers buffer record/flush/empty/error paths + proxy passthrough + per-call recording failure isolation + factory naming convention. 9 new tests, suite 1480 → 1489 passing, zero regressions. Closes ROADMAP Phase 0.2 row "News event extraction (data, $20–60/mo, dominant slice)" — first production exercise will land on the next Saturday SF (2026-05-30) when RAGIngestion fires. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI failure on PR #308: ``ModuleNotFoundError: No module named 'moto'`` — this repo's CI installs only ``requirements.txt + pytest`` (no dev-extras file), and the rest of the suite avoids moto by using a minimal in-memory S3 mock (``_InMemoryS3`` in ``tests/test_news_aggregates.py``). Mirror that pattern here so the cost-telemetry tests run cleanly in the unmodified CI environment. Functional coverage unchanged — all 9 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Phase 4 #1) (#309) Mirrors alpha-engine-research's ``llm_cost_tracker.RunBudgetExceededError`` pattern at the news-pipeline cost site (Phase 0.2 wiring, PR #308): ``S3CostBuffer.record()`` now tracks cumulative cost across the run and raises ``CostBudgetExceededError`` after recording the offending row if the per-run total exceeds ``ALPHA_ENGINE_RUN_BUDGET_USD`` (default $100, shared env var with research + executor — one operator knob ceilings cost across all SF entry points). **Failure shape:** row is recorded BEFORE the raise so per-call detail is preserved on S3 when the breaker fires. The pipeline-side try/finally in ``run_news_pipeline.py`` ensures the buffer flushes even when the breaker raises mid-loop — rows up to and including the breach call land on S3 so operators can diagnose what broke the budget without re-running. **Posture:** breaker propagates through the client proxy (``_CostTrackingMessages.create``) — generic record errors still get swallowed (event extraction's primary deliverable must survive a malformed-response hiccup), but ``CostBudgetExceededError`` is explicitly re-raised since swallowing it would defeat the safety net. **Operator-facing fields on the error:** ``run_id``, ``agent_id``, ``cumulative_cost_usd``, ``ceiling_usd``. Message tells operator how to adjust the env var if the ceiling needs raising. **Tests:** 4 ``TestRunBudgetCeilingResolution`` (default / env / zero disables / malformed-returns-zero) + 5 ``TestCostBudgetBreaker`` (under-ceiling no raise, breach raises after recording row, zero disables, proxy propagates, ceiling defaults from env). Suite 1484 → 1493 passing, zero regressions. **Composes with** PR #308 (the Phase 0.2 wiring) — the breaker is a small additive surface on the buffer; the production wiring path unchanged. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…or (#310) Applies the standing rule per ``[[preference_llm_calls_confined_to_research_module]]`` — LLM calls live in alpha-engine-research. The news pipeline's Haiku-backed event extractor is removed and replaced with a deterministic classifier that uses two zero-cost signals already on the wire: 1. **Vendor tags** (``NewsArticle.tags``). Polygon emits keywords, GDELT emits structured event codes, Benzinga emits Channels. The ``alpha_engine_lib.sources.protocols.NewsArticle.tags`` docstring explicitly names this as "a soft signal for downstream event-flag extraction" — we were paying Haiku to re-derive what Polygon / GDELT already tagged. 2. **Title-keyword regex**. Backstop for sources that don't populate tags (Yahoo RSS). 17 pattern → category mappings against the ``DEFAULT_EVENT_CATEGORIES`` closed taxonomy. **Why this is the right answer, not a kill-switch:** code audit found the Haiku per-article structured output was aggregated to 5 scalar / list columns (``event_count``, ``event_severity_max/mean``, ``event_categories``, ``top_event_descriptions``) before any research consumer touched it. The "zero-shot novel-event detection" capability was mostly wasted — research only sees per-ticker rollups. Tag-based + keyword-based classification produces equivalent rollups deterministically. **Cost impact:** retires the largest previously-untracked LLM cost slice in the system per the original Phase 0 audit estimate ($20–60/mo). Actual spend on the deleted call site goes to $0; the research consumer sees identical EventFlag shape (extractor slug changes from ``"anthropic_haiku"`` to ``"rule_based"``) and identical aggregate columns in ``news_aggregates/{date}.parquet``. **Substrate cleanup:** retires three files added earlier this session: - ``collectors/nlp/event_extraction.py`` (the Anthropic extractor itself) - ``rag/pipelines/_cost_telemetry.py`` (Phase 0.2 cost-telemetry buffer, PR #308 + Phase 4 #1 runaway-cost breaker, PR #309 — both retired with the LLM call site they instrumented) - ``tests/test_news_cost_telemetry.py`` (mirrored tests) ``DEFAULT_EVENT_CATEGORIES`` moves into the new ``collectors/nlp/rule_based_event_extraction.py`` so the closed taxonomy stays accessible to downstream consumers. **Protocol contract:** ``EventExtractor.extract`` gains an optional ``article_tags: tuple[str, ...] = ()`` kwarg (back-compat default). The pipeline plumbs the tag union across article variants. Any future EventExtractor implementation (FinBERT, spaCy, reactivated LLM via research module) consumes the same shape. **Severity convention:** rule-based flags emit ``severity=0.5`` uniformly (the EventFlag protocol's documented default). The Haiku severity was a free-floating judgment never tuned by any operator alert. Per-category severity tuning can be added via YAML if a downstream surface needs it. **Tests:** ``TestRuleBasedEventExtractor`` (10 tests) covers empty-text short-circuit, no-match returns empty, title-keyword classification per category (earnings / M&A / FDA), tag-based classification (Polygon/GDELT shape), tag+title union, multi-category emission, deterministic ordering per ``DEFAULT_EVENT_CATEGORIES``, zero-LLM-dependency contract, title-as-description shape. Suite 1493 → 1479 net (retired the 9 cost-telemetry tests + 7 Anthropic extractor tests; added 10 rule-based tests). **Composes with:** - ``[[preference_llm_calls_confined_to_research_module]]`` — the rule this PR enforces - alpha-engine #212 (executor EOD narrative kill switch) — sibling application of the same rule. Two non-research LLM call sites; this PR retires data's entirely, executor's keeps the kill switch substrate (default off) since the LLM path may be operator-reactivated. - Retires the substrate from data #308 + data #309 (Phase 0.2 + Phase 4 #1 cost-telemetry buffer + breaker) — both became dead code with the LLM call site they instrumented. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 and others added 2 commits May 25, 2026 10:24

cipher813 merged commit 2f9d50a into main May 25, 2026
1 check passed

cipher813 deleted the feat/cost-telemetry-event-extraction-v0.33 branch May 25, 2026 17:50

cipher813 mentioned this pull request May 25, 2026

feat(cost-telemetry): runaway-cost circuit breaker on news pipeline (Phase 4 #1) #309

Merged

2 tasks

cipher813 mentioned this pull request May 25, 2026

feat(nlp): replace AnthropicEventExtractor with RuleBasedEventExtractor #310

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cost-telemetry): wire news event extraction + lib v0.32→v0.33 (Phase 0.2)#308

feat(cost-telemetry): wire news event extraction + lib v0.32→v0.33 (Phase 0.2)#308
cipher813 merged 2 commits into
mainfrom
feat/cost-telemetry-event-extraction-v0.33

cipher813 commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 25, 2026

Summary

Zero-coupling pattern

Single S3 chokepoint

Buffered + flushed once

Failure semantics

Lib pin

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant