Skip to content

feat(signals): signal extraction with calibrated confidence and manipulation flags#3

Merged
Mathews-Tom merged 16 commits into
mainfrom
feat/signal-extraction-core
Apr 17, 2026
Merged

feat(signals): signal extraction with calibrated confidence and manipulation flags#3
Mathews-Tom merged 16 commits into
mainfrom
feat/signal-extraction-core

Conversation

@Mathews-Tom

@Mathews-Tom Mathews-Tom commented Apr 17, 2026

Copy link
Copy Markdown
Owner

Summary

Delivers the working signal-extraction pipeline end to end: Pydantic data contracts, ingestion adapters, adaptive polling, feature computation, five detectors, manipulation flagging, calibration primitives (BH-FDR, reliability curves, drift monitor, liquidity banding), deduplication, storm handling, deterministic context assembly, DuckDB storage, and an orchestrator that composes the full cycle. Live signal extraction becomes operational against Polymarket and Kalshi once API credentials are provisioned.

What Changed

  • Models: MarketSnapshot, FeatureVector, MarketSignal, SignalContext, RelatedMarketState, and the four closed enums. MarketSignal enforces raw_features["calibration_provenance"] via a model validator; all models are frozen and reject unknown fields.
  • Ingestion: AbstractPoller protocol, PolymarketPoller and KalshiPoller concrete implementations, exponential-backoff retry, and the normalizer that maps raw payloads onto MarketSnapshot.
  • Adaptive polling: four-tier state machine (hot / warm / cool / cold) with hysteresis bands and rate-limit-pressure demotion.
  • Features: per-market SnapshotBuffer, halt-aware EWMA, and pure-function indicators computed over 5m / 15m / 1h / 4h observation-count windows.
  • Detectors: price velocity (Bernoulli-Beta BOCPD against running-mean projections), volume spike (EWMA z-score), book imbalance (depth-gated with persistence), regime shift (two-sided CUSUM with dormancy gate), cross-market divergence (Spearman + Fisher-z + BH-FDR batch).
  • Manipulation: five pure-function signatures (Herfindahl concentration, size-vs-depth outlier, cancel-replace burst, thin-book-during-move, pre-resolution window), the aggregator, and the curated CURATED_EPISODES metadata.
  • Calibration: Benjamini-Hochberg FDR controller, reliability analyzer with identity placeholder curve, empirical FPR computation, PSI+KS drift monitor, liquidity-tier banding.
  • Storage: DuckDB store with schema migrations and typed round-trip for snapshots, features, signals, and manipulation flags.
  • Bus and dedup: bounded async bus, fingerprint deduplication, taxonomy-cluster merge, and storm-mode controller with asymmetric trigger/recovery thresholds.
  • Context: bidirectional MarketTaxonomy, frozen InvestigationPromptLibrary, RelatedMarketResolver, and the deterministic ContextAssembler.
  • Engine and guards: Engine.run_cycle composes the full pipeline; scripts/lint_detector_now.py rejects datetime.now() calls inside detector modules and is wired into pre-commit and CI.

How It Works

Detector algorithms follow docs/methodology/calibration-methodology.md. Manipulation signatures match docs/methodology/manipulation-taxonomy.md. The polling state machine, deduplication, and storm handling mirror the specs in docs/architecture/adaptive-polling-spec.md and docs/architecture/deduplication-and-storms.md. Every layer is bound by the schemas in docs/contracts/schema-and-versioning.md; producers stamp schema_version on every emitted message and consumers validate against the exported JSON schemas.

Configuration Added

  • config/detectors.toml — parameters for the five detectors (hazard rate, alpha/beta priors, fire thresholds, FDR q, dormancy, CUSUM k/h multipliers, depth gates).
  • data/investigation_prompts.toml — curated (signal_type, category) -> prompt entries consumed by the frozen InvestigationPromptLibrary.

No new top-level TOML files were added beyond the scaffolding set; existing config/detectors.toml, config/polling.toml, config/dedup.toml, and data/investigation_prompts.toml drive the runtime via the Pydantic configuration models.

Schema Changes

Four new JSON schemas exported under schemas/:

  • MarketSnapshot-1.0.0.json
  • FeatureVector-1.0.0.json
  • MarketSignal-1.0.0.json
  • SignalContext-1.0.0.json

All at schema version 1.0.0 per docs/contracts/schema-and-versioning.md §Versioning Policy. No breaking changes; this is the initial publication of the contract.

Quality Gates

  • uv run ruff check . clean
  • uv run ruff format --check . clean
  • uv run mypy --strict src/ clean (66 source files)
  • uv run pytest --cov-fail-under=80 passes; 124 tests, 90.3 % total coverage
  • LLM-import guard passes
  • Schema export in sync (uv run python scripts/export_schemas.py --check)
  • Forbidden-token doc lint passes
  • datetime.now() AST guard passes (uv run python scripts/lint_detector_now.py)
  • uv run pre-commit run --all-files passes
  • CI workflow runs green on this PR (pending remote run)

Definition of Done

  • Every file under src/augur_signals/ and its tests pass lint, format, strict mypy.
  • Detector property invariants covered in tests: constant stream -> no signal, boundary prices do not crash, pre-resolution exclusion, state round-trip, reset.
  • Determinism: 100 invocations of ContextAssembler.assemble on the same input produce byte-identical JSON.
  • Manipulation episode fixture metadata spans the full ManipulationFlag enum.
  • BH-FDR correctness tests cover ordering invariants, empty input, and q validation.
  • Schema export: four schemas committed and --check succeeds.
  • Synthetic-data integration test: Engine.run_cycle produces at least one SignalContext when a step change follows a flat warmup window.
  • CI guard grep for LLM imports returns zero matches.
  • AST guard for datetime.now() in detector modules passes.
  • CHANGELOG.md updated with the full module-level additions.
  • Operational follow-up (post-merge): 24 h live run against Polymarket and Kalshi APIs without crashes, rate-limit violations, or signal-emission validation failures. Deferred until a deployment environment and API credentials are in place.
  • Operational follow-up (post-merge): backtest harness run against a sample snapshot range with per-detector precision / recall / lead-time report. Deferred until the labeling workstream produces a real event corpus.

Operational Handoff

After merge the engine can run continuously on a single machine, ingest from Polymarket and Kalshi, emit calibrated MarketSignal events persisted to DuckDB, and produce deterministic SignalContext envelopes ready for the formatter workstream. Real reliability curves require labeled events produced by the labeling workstream; during the warmup period every signal carries calibration_provenance = "<detector>@identity_v0".

Test Plan

  • uv run pytest passes locally (124 tests).
  • uv run python scripts/export_schemas.py --check passes locally.
  • uv run python scripts/lint_detector_now.py passes locally.
  • uv run pre-commit run --all-files passes locally.
  • CI workflow runs green on this PR.
  • Operational: run Engine.run_cycle against a recorded Polymarket and Kalshi fixture once recorded and confirm a non-empty context stream.

Review Pass

  • pr-review findings addressed: 4 HIGH + 4 MEDIUM fixed on the branch (3784eaa fix(signals): address pr-review findings in extraction core); 1 HIGH (doc-only) deferred; 5 LOW deferred.
  • code-refiner simplifications applied: none actionable. Review completed clean on the simplification axis.

Deferred Findings

  • HIGH (doc-fix): docs/contracts/schema-and-versioning.md:114 types MarketSignal.raw_features as dict[str, float], but calibration_provenance, merge_provenance, cluster_member_signal_ids, and related_market_id all carry string values in the shipped code. The code is correct; the published contract needs a dict[str, float | str] update. Will open a separate fix(docs) PR against the same branch before merge or as a follow-up.
  • MEDIUM: BOCPD Bernoulli-projection design decision is documented in the detector docstring but the methodology doc could expand on why the projection is preferred over raw prices for sharp posteriors. Follow-up doc enhancement.
  • LOW: Storm bus comment mislabels drop policy as "LIFO" when the implementation is FIFO-oldest-drop — the spec uses "LIFO" terminology for the conceptual end (newest wins); the comment stays aligned with the spec. No code change.
  • LOW: Scheduler has a redundant inner if utilization > 0.80 — cosmetic, deferred.
  • LOW: ReliabilityAnalyzer.calibrate is wired for future detector use; the detectors currently emit raw scores with the @identity_v0 provenance. This is the documented Phase-1 behavior.
  • LOW: Two different raw_feature keys for the source-id concept (merge_provenance vs cluster_member_signal_ids) retained because the dedup stages are distinguishable in the audit trail.
  • LOW: test_export_schemas.py does not separately test RelatedMarketState because it appears nested under SignalContext; already covered transitively.

…action

Land the binding data contracts between every layer of the extraction
pipeline — MarketSnapshot, FeatureVector, MarketSignal, SignalContext,
RelatedMarketState — plus the closed enums SignalType,
ManipulationFlag, ConsumerType, and InterpretationMode. Field sets
mirror docs/contracts/schema-and-versioning.md verbatim; all models
are frozen and reject unknown fields so a producer cannot silently
add a member that a consumer does not recognize.

MarketSignal carries a model_validator that rejects any instance whose
raw_features lacks a non-empty calibration_provenance string. This is
the project-wide invariant from the development plan (§7.2): no
uncalibrated signal escapes the producer. Tests cover both the
missing-key and empty-string cases.

uuid7-based signal identifiers preserve time ordering, which lets the
bus and storage layers sort by identifier and still recover temporal
order — a prerequisite for byte-identical backtest replay.

scripts/export_schemas.py registers the four shipped models and emits
their JSON schemas to schemas/*.json with deterministic key ordering.
The --check mode is now load-bearing: a modification to any model
shape fails CI until the export is regenerated.
Introduce the ingestion seam: AbstractPoller (two concrete
implementations — Polymarket and Kalshi), the RawMarketData /
RawOrderBook / RawTrade DTOs that pollers emit, and the normalizer
that turns those DTOs into the canonical MarketSnapshot.

All platform-specific field mapping lives in the pollers and the
normalizer. PolymarketPoller and KalshiPoller each wrap an
aiohttp.ClientSession and route every request through the shared
exponential-backoff helper (initial 1 s, cap 60 s, max 5 retries per
docs/architecture/adaptive-polling-spec.md §Backoff Policy). The
KalshiPoller fails loud at construction when KALSHI_API_KEY is absent
rather than surfacing a credential error on first call.

The normalizer (1) verbatim-preserves question, resolution_source,
resolution_criteria, (2) computes spread from bid/ask when both
present, (3) derives liquidity from top-5 levels of the order book,
(4) stamps schema_version. MalformedPayloadError is raised on missing
required keys; there is no silent default.

Tests cover the retry success / retry / exhaust paths (with injectable
sleep so the suite runs under 100 ms), normalization of both
Polymarket and Kalshi shaped payloads, and the missing-price failure
mode.
The scheduler is the state machine in
docs/architecture/adaptive-polling-spec.md: each market sits in one of
four tiers (hot, warm, cool, cold) with asymmetric promotion and
demotion thresholds against volume_ratio_1h. The hysteresis bands
(±10 % around each switch point) prevent a market sitting near a
threshold from flapping between tiers on consecutive ticks, which
would corrupt rolling-window features whose semantics depend on
consistent temporal sampling.

The tier enum drives the polling interval via interval_seconds; a
downstream poller loop reads the current tier and schedules the next
tick accordingly. Markets closing within 24 h promote from cool to
warm regardless of volume; an active recent signal promotes from warm
to hot.

Rate-limit pressure is fed back via observe_platform_pressure; above
80 % utilization the scheduler emits a RateLimitPressureEvent and
demotes the hot market with the lowest volume_ratio_1h so the
platform regains headroom without starving the most-active markets.

PollingConfig, PollingBody, HysteresisBands, PlatformCaps, and
BackoffSettings are frozen Pydantic models mirroring the TOML schema
in config/polling.toml. Tests cover the full promotion / demotion
chain, the hysteresis band, closes-within-24h promotion, and the
rate-limit demotion path.
… ewma

The pipeline sits between normalized snapshots and the detectors,
producing a FeatureVector per market per tick. Per-market state
includes a bounded SnapshotBuffer (default 500 snapshots, ~4 hours at
30 s polling), an EWMA baseline over volume_24h with alpha 0.05, and a
rolling estimate of the polling interval.

Window semantics follow
docs/architecture/adaptive-polling-spec.md §Wall-Clock vs
Observation-Count Window Reconciliation. Windows are stored as
observation counts internally and the 5m / 15m / 1h / 4h labels are
derived at compute time from the current polling-interval estimate.
When a market changes polling tier, the window size recomputes next
tick — the feature is "volatility of the samples we have", not of an
unobserved continuous process.

EWMA updates are halt-aware: when the gap since the last observation
exceeds 2x the expected interval, the decay multiplier applies
(1 - alpha)^gap_factor so the baseline does not freeze through the
gap. This prevents a mid-window halt from masking the post-halt
volume surge.

Indicators are pure functions taking a snapshot sequence and returning
float | None for underdetermined cases; no hidden state. Tests cover
idempotency (same buffer, same vector), warmup behavior, the EWMA
halt-decay path, and the boundary behavior of bid/ask and spread
when one side is missing.
…face

SignalDetector is the protocol every detector implements: warmup,
ingest(market, feature, snapshot, now), state_dict / load_state for
checkpointing, and reset. The ingest signature makes now a parameter
rather than reading datetime.now(), which keeps backtest replay
bit-for-bit identical to live runs — a prerequisite for calibration
fidelity per docs/methodology/calibration-methodology.md.

DetectorRegistry dispatches per-market detectors observation-at-a-time
and batch detectors (cross-market divergence) across the whole polling
cycle. Registration is explicit so the engine composes exactly the
set of detectors the configuration enables.

DetectorsConfig composes five per-detector sub-models — PriceVelocity,
VolumeSpike, BookImbalance, CrossMarket, RegimeShift — mirroring the
block shape in config/detectors.toml. Every sub-model carries its
resolution_exclusion_seconds default (21600 = 6 h) so the pre-
resolution-window invariant is enforced uniformly.
…a bocpd

Land the price-velocity detector per the method description in
docs/methodology/calibration-methodology.md. The change-point model is
Bernoulli-Beta BOCPD (Adams & MacKay 2007) on a binary projection of
each price observation against a per-market EWMA of recent prices —
sustained deviations drive the posterior run-length distribution's
mass onto the short-run bucket, producing a sharp signal precisely
when the rate of up-ticks vs down-ticks changes.

The run-length distribution is capped at run_length_cap and mass that
would otherwise fall off the cap is absorbed back into the cap bucket
with a weighted average of the sufficient statistics. Without the
absorption, long steady-state streams leak probability mass and
produce a slow drift in P(r_t < 5).

PriceVelocityDetector enforces three operational invariants inside
ingest: (1) the 6 h pre-resolution exclusion — signals inside the
window are never returned regardless of posterior magnitude,
(2) a per-market cooldown so the same underlying change does not fire
repeatedly, (3) a 50-observation warmup so the early run-length
distribution (trivially concentrated at r = 0) does not produce
spurious signals on a fresh market.

liquidity_tier.banding provides the per-snapshot tier estimator
against the tier thresholds in docs/foundations/glossary.md.

Tests cover the BOCPD math invariants (constant stream drives
P(r_t < 5) below the 0.3 noise floor, step change fires within 50
observations, out-of-range observation raises), the detector-level
behaviors (pre-resolution exclusion, boundary prices, state round
trip, reset), and the flat-stream no-signal property.
…hift detectors

Three per-market detectors ship together because they share the
stateful-per-market / pre-resolution-excluded / warmup-gated shape and
exercise the same dispatch path through the registry.

Volume spike maintains per-market EWMA mean and variance of
volume_ratio_1h (alpha 0.05 by default). Signals fire when the raw
z-score exceeds the configured minimum_z and the absolute 24 h volume
is above the per-market floor. The FDR controller layer composes over
these raw z-scores once the calibration module lands; the detector
itself does not gate on FDR because the controller needs a batch of
candidates across markets, not a per-market decision.

Book imbalance applies a depth gate before the ratio check so signals
do not fire on thin books where an imbalance is more likely a
manipulation artifact than a directional view
(docs/methodology/manipulation-taxonomy.md §thin_book_during_move).
The persistence requirement (default 3 consecutive snapshots) filters
transient imbalances.

Regime shift uses a two-sided CUSUM on volatility_1h with a per-market
dormancy gate. The detector only fires after the dormancy window has
elapsed since the last signal (or since initialization), so a
sustained increase in volatility following a quiet window is what
trips the detector. An adaptive cooldown multiplies the dormancy
window by the configured factor after each firing, preventing the
same regime from emitting signals repeatedly as volatility continues
to rise.

All three enforce the 6 h pre-resolution exclusion inside ingest,
thread ``now`` as a parameter (no ``datetime.now()`` calls), and
populate the calibration_provenance stamp so emitted signals satisfy
the MarketSignal model validator.
…onitor, and cross-market detector

The calibration layer ships four cooperating modules and a dependent
cross-market detector that exercises them end to end.

benjamini_hochberg implements the step-up procedure: sort ascending,
find the largest rank k with p_(k) ≤ (k/m) q, accept all hypotheses
whose p-value is at most p_(k). FDRController wraps the procedure and
returns the set of signal_ids that pass when a detector submits a
batch of (signal_id, p_value) pairs per polling cycle. This is the
shared primitive the cross-market divergence detector and
(post-Phase 1) the volume-spike detector gate on.

ReliabilityAnalyzer looks up curves by (detector_id, liquidity_tier)
and linearly interpolates raw scores onto the cached decile grid. The
identity placeholder curve (raw == calibrated, version identity_v0)
is returned whenever no empirical curve has been registered yet so
signals produced during the warmup period still satisfy the
calibration_provenance invariant on MarketSignal.

EmpiricalFPR computes FP / (FP + TN) against a labeled event stream
per docs/methodology/labeling-protocol.md §True Positive with a 24 h
default lead window. The contract is functional today against
synthetic labels; real populations wait on the labeling workstream.

DriftMonitor computes Population Stability Index and a two-sample
Kolmogorov-Smirnov statistic over baseline vs current score
populations. The nightly calibrate run invokes the monitor and emits
a CalibrationStaleEvent when either metric crosses its threshold.

CrossMarketDivergenceDetector operates on batches across the full
polling cycle because the FDR controller needs to see every
candidate market pair's p-value simultaneously. For each curated
related-market pair, the detector computes the current Spearman rho,
applies the Fisher-z transform, compares the delta-z to the pair's
historical z, and emits a signal when BH-FDR accepts the p-value at
the target q.

Twelve tests cover BH correctness (ordering invariants, empty input,
q validation), FDR controller behavior, reliability identity and
registered-curve paths, liquidity tier banding, empirical FPR for
true-positive and unlabeled cases, drift monitor triggering on a
clear distribution shift vs silence on stable scores, and the
cross-market detector firing on a decorrelation event.
…ode metadata

The manipulation detector attaches flags to every candidate signal
before it reaches the bus. Five pure-function signatures, each
authoritative in docs/methodology/manipulation-taxonomy.md, drive the
decision: single_counterparty_concentration (Herfindahl on trade
volume), size_vs_depth_outlier (one trade consumed > threshold of
prior depth), cancel_replace_burst (cancel / replace events exceed
threshold in a rolling window), thin_book_during_move (median depth
below floor), and pre_resolution_window (signal fired inside the
6 h pre-close exclusion).

ManipulationDetector runs every signature against the trades, book
events, and snapshots surrounding a candidate signal and returns the
matched flags. The list is always present and always a list — never
None. The detector does not suppress; consumer policy applies
suppression per the taxonomy doc. attach_flags produces a new
MarketSignal with the flags set, routed through model_copy so
Pydantic re-validates the calibration_provenance invariant.

CURATED_EPISODES enumerates five canonical historical cases used as
positive-case test fixtures with their expected flag sets; the test
suite verifies that the flag coverage across episodes spans the full
ManipulationFlag enum, preventing the episode catalogue from drifting
out of sync with the taxonomy.
… features, signals

The storage layer persists the full pipeline output — snapshots,
features, signals, manipulation flags, empirical FPR records, and
reliability curves — to a single DuckDB database. The schema mirrors
docs/architecture/system-design.md §Storage Schema exactly so queries
against the backtest harness produce the same shape the live engine
writes.

initialize is idempotent: it applies the migration statements in order
and stamps the schema_version table. The CREATE TABLE IF NOT EXISTS
form lets repeat initializations run without error; the schema_version
row uses INSERT OR IGNORE so the applied_at timestamp is written only
once. Future migrations append to a numbered list and rerun on
startup; no destructive migrations are expected inside a major
version.

Insert paths accept the frozen Pydantic models directly and serialize
JSON payloads (raw_json on snapshots, raw_features on signals,
manipulation_flags when non-empty). Read paths return typed model
instances by routing rows through Pydantic's model_validate so the
calibration_provenance invariant still holds on recovery.

Storage is deliberately kept synchronous in this implementation; the
engine serializes writes through one connection. The async facade in
the scaling workstream drops in with the same method surface.
…storm controller

Four cooperating modules between the detector layer and the context
assembler, all bounded by the semantics in
docs/architecture/deduplication-and-storms.md.

InProcessAsyncBus fans published signals out to every current
subscriber with a per-subscriber bounded queue and a LIFO drop under
pressure. This is the Phase-1 implementation; later phases swap in a
NATS / Redis Streams adapter behind the same method surface.

FingerprintDedup collapses signals that share (market_id, signal_type,
time_bucket) with max magnitude, max confidence, union of
manipulation_flags and related_market_ids, earliest detected_at, and
the lexicographically-smallest signal_id. The merge_provenance entry
in raw_features lists every source signal_id so the backtest can
reconstruct the pre-dedup stream.

ClusterMerge layers on top of FingerprintDedup and merges signals of
the same type firing on taxonomy-related markets inside the cluster
window. Only strong edges (positive, inverse, causal) contribute to
merging — complex and unknown edges are intentionally excluded
because cluster merge asserts a shared cause.

StormController tracks raw arrival rate and queue depth against
asymmetric trigger / recovery thresholds. Entry on either trigger,
exit only when both recovery conditions hold for the recovery window.
Storm mode is advisory at this layer; the engine consumes StormState
to switch the dedup layer to cluster-only output and suspend the LLM
formatter.
…lated-market resolver

The context assembler wraps every MarketSignal with verbatim platform
metadata, the latest snapshots of related markets, and investigation
prompts drawn from a frozen curated library. The output
(SignalContext) is the binding contract between extraction and
downstream formatters: consumers never see the raw MarketSignal.

MarketTaxonomy loads curated edges from TOML, stores them
bidirectionally (an edge a <-> b is reachable from both markets), and
exposes ``cluster_for`` with filtering by relationship type so the
dedup layer's cluster merge only considers strong edges (positive,
inverse, causal). complex edges are intentionally excluded from
clustering because the causal equivalence they assert is too weak.

InvestigationPromptLibrary is frozen-at-construction: duplicate entries
raise immediately, the public interface is lookup-only, and a coverage
report enumerates the (signal_type, category) tuples with no prompts
so startup can log the gaps. The library reads from
data/investigation_prompts.toml via a classmethod.

RelatedMarketResolver fetches the latest snapshot per related market
from DuckDB and computes the 24 h price delta. Markets without recent
snapshots are omitted; the 1 h freshness window is configurable.

ContextAssembler is a pure function of (signal, store, taxonomy,
resolver, prompt library, category map). Two invocations with
identical inputs produce byte-identical JSON — the determinism test
(assemble twice, compare model_dump_json) exercises this invariant
and is one of the gates for the extraction workstream's Definition of
Done.
…or determinism

The Engine composes the full extraction pipeline: snapshot -> feature
pipeline -> detector dispatch -> manipulation evaluation -> fingerprint
and cluster dedup -> bus publish -> context assembly. run_cycle takes
snapshots, the features for each market, the recent trades and book
events used by the manipulation detector, and now — threaded through
every downstream call so the backtest harness and live engine traverse
the same code with bit-for-bit identical timing.

The scripts/lint_detector_now.py AST guard parses every file under
src/augur_signals/augur_signals/detectors/ and fails non-zero on any
direct datetime.now() call — whether via ``datetime.now()`` or
``datetime.datetime.now()``. The guard is wired into both
pre-commit (id: datetime-now-in-detectors) and the CI workflow so the
invariant holds through every merged commit.

tests/signals/test_engine_integration.py replays a synthetic 160-tick
snapshot stream (flat phase followed by a sustained level shift)
through the engine and asserts at least one SignalContext is emitted.
The test exercises the full composition — detectors, manipulation,
fingerprint, cluster, bus, assembler — and is the stand-in for the
recorded-API-fixture integration test that lands with the labeling
workstream.
Cross-market divergence now keys FDR submissions on the pair
``(market_a, market_b)`` rather than just ``market_a``. Before, a
market that participated in multiple related-market pairs collapsed
to a single per-market pass/fail on the FDR set return — all pair
signals for that market survived even when only one pair's p-value
crossed. The pair-level key ensures each pair's decision is
independent.

Cluster-merge representative selection follows the spec: the highest
liquidity tier in the cluster wins, ties break alphabetically by
market_id. The prior max-magnitude heuristic contradicted
docs/architecture/deduplication-and-storms.md §Cluster-Level Merge.

DuckDBStore.signals_in_window now rehydrates manipulation flags from
the side table before returning. Signals went to storage with their
flags persisted but came back with empty flag lists — a silent
correctness hazard for backtests that audit manipulation coverage.

RelatedMarketResolver's delta window now defaults to 86_400 seconds
(24 hours) so ``RelatedMarketState.delta_24h`` matches its name and
its contract in docs/contracts/schema-and-versioning.md. The
constructor argument is renamed ``delta_window_seconds`` to make the
semantic explicit.

StormController now requires the queue-depth trigger to sustain
across ``trigger_queue_depth_window_sec`` before entering storm mode,
matching docs/architecture/deduplication-and-storms.md §Storm
Detection. A single-tick depth spike no longer flips the controller.

RegimeShiftDetector's direction now compares magnitudes rather than
preferring the positive arm; when both arms cross the threshold in
the same tick, the dominant excursion's sign is emitted.

compute_empirical_fpr now requires ``now`` as a parameter instead of
falling back to ``datetime.now()``. This extends the
"now as parameter" invariant beyond detectors to every time-sensitive
entry point so backtest runs are deterministic across wall clocks.

Two new tests round out coverage: a 100-invocation determinism test
for ContextAssembler per the contract in §SignalContext, and a
round-trip test verifying manipulation flags survive DuckDB write/read.
@Mathews-Tom Mathews-Tom merged commit 2e5b6bf into main Apr 17, 2026
2 checks passed
@Mathews-Tom Mathews-Tom deleted the feat/signal-extraction-core branch April 17, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant