Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,22 @@ All notable changes to Augur are recorded in this file. Format follows [Keep a C

## [Unreleased]

### Added — Deterministic Formatters

- `src/augur_format/deterministic/json_feed.py` — `to_canonical_json` emits UTF-8 JSON bytes with stable key ordering (top-level, signal block, related-market block), six-decimal float rounding (configurable), and Z-suffix UTC timestamps. Byte-identical across invocations.
- `src/augur_format/deterministic/severity.py` — pure `derive_severity` mapping magnitude × confidence against per-liquidity-tier thresholds to `{high, medium, low}`. Formula lives in code so consumers can reproduce locally.
- `src/augur_format/deterministic/markdown.py` — Jinja2 `MarkdownFormatter` rendering five per-signal-type templates that extend `_base.md.j2`. Templates ship inside the wheel via the hatch `include = ["augur_format/**/*.j2"]` rule.
- `src/augur_format/validate/` — `ConsumerEnumValidator` rejects briefs whose `actionable_for` contains values outside `ConsumerType`; `load_schema` reads exported JSON schemas from `schemas/` for debug-build validation.
- `src/augur_format/transport/webhook.py` — `WebhookFormatter` POSTs canonical JSON, wrapped Markdown, or Slack Block Kit payloads to configured destinations with exponential-backoff retry on 5xx/429 and drop on 4xx. Auth headers sourced from env vars at delivery time.
- `src/augur_format/transport/websocket.py` — `WebSocketBroadcaster` with `SIGNAL`, `HEARTBEAT`, `STORM_START`, `STORM_END` frame types; oldest-drop under full per-connection queues for timeliness under pressure.
- `src/augur_format/routing/` — `ConsumerRegistry.from_toml` loads `config/consumers.toml` and exposes per-category routing; `SignalRouter` maps `SignalContext` to the consumer set, surfacing suppressed consumers for `llm_assisted` interpretation mode.
- `src/augur_format/llm/models.py` — `IntelligenceBrief` contract declared in this phase for completeness. The gated LLM formatter in the next phase instantiates the model; the JSON schema ships at `schemas/IntelligenceBrief-1.0.0.json`.
- `config/formatters.toml` mirrors `phase-3 §12.2` with JSON, Markdown, Webhook, and WebSocket blocks validated against `FormatterConfig`.

### Operational Handoff — Deterministic Formatters

After merge operators can subscribe clients to the WebSocket broadcaster for live signal frames, wire webhook targets (Slack or generic JSON/Markdown) to push brief deliveries, and route signals to consumers via the `ConsumerRegistry` loaded from `config/consumers.toml`. The canonical JSON feed is ready for any consumer that validates against `schemas/SignalContext-1.0.0.json`.

### Added — Labeling Pipeline

- `src/augur_labels/` package with Pydantic data contracts for `NewsworthyEvent`, `EventCandidate`, `SourcePublication`, `QualifyingSource`, `LabelDecision`, `AnnotatorIdentity`, and `AgreementReport`. The closed `source_id` literal set (reuters, bloomberg, ap, ft) is load-bearing across adapters, storage, and workflow.
Expand Down
26 changes: 26 additions & 0 deletions config/formatters.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Deterministic formatter configuration. Schema mirrors phase-3 §12.2
# verbatim; each block maps onto a Pydantic config sub-model in
# augur_format._config. A malformed file fails at startup via
# augur_signals._config.load_config.

[json]
float_decimals = 6
timestamp_format = "iso_z"

[markdown]
template_dir = "src/augur_format/augur_format/deterministic/templates"
trim_blocks = true
lstrip_blocks = true

[webhook]
initial_retry_delay_seconds = 1
max_retry_delay_seconds = 60
max_retries = 5
delivery_timeout_seconds = 10

[websocket]
bind = "0.0.0.0"
port = 8765
heartbeat_interval_seconds = 30
heartbeat_timeout_seconds = 90
per_connection_buffer = 64
90 changes: 90 additions & 0 deletions schemas/IntelligenceBrief-1.0.0.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
{
"$defs": {
"ConsumerType": {
"description": "Registered consumers of the brief feed per docs/contracts/consumer-registry.md.",
"enum": [
"macro_research_agent",
"geopolitical_research_agent",
"crypto_research_agent",
"financial_news_desk",
"regulatory_news_desk",
"dashboard"
],
"title": "ConsumerType",
"type": "string"
}
},
"additionalProperties": false,
"description": "Gated LLM formatter output contract.\n\n``actionable_for`` is constrained to the ConsumerType registry in\ndocs/contracts/consumer-registry.md via the Pydantic field type;\nthe closed-enum validator rechecks this at the formatter boundary\nso even dynamically-constructed instances fail loud on unknown\nvalues.",
"properties": {
"actionable_for": {
"items": {
"$ref": "#/$defs/ConsumerType"
},
"title": "Actionable For",
"type": "array"
},
"body_markdown": {
"title": "Body Markdown",
"type": "string"
},
"brief_id": {
"title": "Brief Id",
"type": "string"
},
"forbidden_token_check": {
"const": "passed",
"default": "passed",
"title": "Forbidden Token Check",
"type": "string"
},
"headline": {
"title": "Headline",
"type": "string"
},
"interpretation_mode": {
"const": "llm_assisted",
"default": "llm_assisted",
"title": "Interpretation Mode",
"type": "string"
},
"model": {
"title": "Model",
"type": "string"
},
"prompt_hash": {
"title": "Prompt Hash",
"type": "string"
},
"schema_version": {
"const": "1.0.0",
"default": "1.0.0",
"title": "Schema Version",
"type": "string"
},
"severity": {
"enum": [
"high",
"medium",
"low"
],
"title": "Severity",
"type": "string"
},
"signal_id": {
"title": "Signal Id",
"type": "string"
}
},
"required": [
"brief_id",
"signal_id",
"headline",
"body_markdown",
"severity",
"model",
"prompt_hash"
],
"title": "IntelligenceBrief",
"type": "object"
}
2 changes: 2 additions & 0 deletions scripts/export_schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@

from pydantic import BaseModel

from augur_format.llm.models import IntelligenceBrief
from augur_signals.models import (
FeatureVector,
MarketSignal,
Expand All @@ -41,6 +42,7 @@
(FeatureVector, "1.0.0"),
(MarketSignal, "1.0.0"),
(SignalContext, "1.0.0"),
(IntelligenceBrief, "1.0.0"),
]


Expand Down
68 changes: 68 additions & 0 deletions src/augur_format/augur_format/_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""Configuration models for deterministic formatters.

Mirrors config/formatters.toml block-for-block. Loaded at engine
startup via augur_signals._config.load_config; a missing required
value or malformed block fails loudly rather than coercing.
"""

from __future__ import annotations

from typing import Literal

from pydantic import BaseModel, ConfigDict, Field


class JsonConfig(BaseModel):
"""Canonical JSON formatter parameters."""

model_config = ConfigDict(frozen=True, extra="forbid")

float_decimals: int = Field(default=6, ge=0, le=18)
timestamp_format: Literal["iso_z"] = "iso_z"


class MarkdownConfig(BaseModel):
"""Jinja2 rendering parameters."""

model_config = ConfigDict(frozen=True, extra="forbid")

template_dir: str = "src/augur_format/augur_format/deterministic/templates"
trim_blocks: bool = True
lstrip_blocks: bool = True


class WebhookConfig(BaseModel):
"""Webhook delivery retry and timeout settings."""

model_config = ConfigDict(frozen=True, extra="forbid")

initial_retry_delay_seconds: float = Field(default=1.0, gt=0.0)
max_retry_delay_seconds: float = Field(default=60.0, gt=0.0)
max_retries: int = Field(default=5, gt=0)
delivery_timeout_seconds: float = Field(default=10.0, gt=0.0)


class WebSocketConfig(BaseModel):
"""WebSocket transport bind, heartbeat, and per-connection buffer."""

model_config = ConfigDict(frozen=True, extra="forbid")

bind: str = "0.0.0.0" # noqa: S104 — documented default bind for the WS server
port: int = Field(default=8765, gt=0, le=65_535)
heartbeat_interval_seconds: int = Field(default=30, gt=0)
heartbeat_timeout_seconds: int = Field(default=90, gt=0)
per_connection_buffer: int = Field(default=64, gt=0)


class FormatterConfig(BaseModel):
"""Top-level formatter configuration loaded from config/formatters.toml."""

model_config = ConfigDict(frozen=True, extra="forbid", populate_by_name=True)

# Field aliased so the TOML block is [json] per the documented
# schema, while the Python attribute is ``canonical_json`` to avoid
# shadowing BaseModel.json.
canonical_json: JsonConfig = Field(default_factory=JsonConfig, alias="json")
markdown: MarkdownConfig = Field(default_factory=MarkdownConfig)
webhook: WebhookConfig = Field(default_factory=WebhookConfig)
websocket: WebSocketConfig = Field(default_factory=WebSocketConfig)
117 changes: 117 additions & 0 deletions src/augur_format/augur_format/deterministic/json_feed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
"""Canonical JSON formatter for SignalContext.

Serializes a SignalContext with stable key ordering, float rounding,
and ISO-8601 UTC timestamps with a ``Z`` suffix. The determinism
contract: same SignalContext in, byte-identical JSON out across any
number of invocations. Consumers can hash the bytes and rely on
stable equality.
"""

from __future__ import annotations

import json
from collections.abc import Mapping
from datetime import datetime
from typing import Any

from augur_signals.models import SignalContext

CANONICAL_KEY_ORDER: tuple[str, ...] = (
"signal",
"market_question",
"resolution_criteria",
"resolution_source",
"closes_at",
"related_markets",
"investigation_prompts",
"interpretation_mode",
"schema_version",
)

SIGNAL_KEY_ORDER: tuple[str, ...] = (
"signal_id",
"market_id",
"platform",
"signal_type",
"magnitude",
"direction",
"confidence",
"fdr_adjusted",
"detected_at",
"window_seconds",
"liquidity_tier",
"manipulation_flags",
"related_market_ids",
"raw_features",
"schema_version",
)

RELATED_KEY_ORDER: tuple[str, ...] = (
"market_id",
"question",
"current_price",
"delta_24h",
"volume_24h",
"relationship_type",
"relationship_strength",
)


def to_canonical_json(context: SignalContext, *, float_decimals: int = 6) -> bytes:
"""Return the canonical JSON bytes for *context*.

Args:
context: The SignalContext to serialize.
float_decimals: Decimal places each float field is rounded to
before serialization. Must be applied consistently across
producers and consumers so equality comparison survives
the round-trip.

Returns:
UTF-8 encoded JSON bytes with no whitespace between separators
and stable key ordering.
"""
dumped = context.model_dump(mode="json")
payload: dict[str, Any] = _ordered_dict(dumped, CANONICAL_KEY_ORDER, float_decimals)
payload["signal"] = _ordered_dict(dumped["signal"], SIGNAL_KEY_ORDER, float_decimals)
payload["related_markets"] = [
_ordered_dict(rm, RELATED_KEY_ORDER, float_decimals)
for rm in dumped.get("related_markets", [])
]
return json.dumps(
payload,
default=_json_default,
ensure_ascii=False,
separators=(",", ":"),
sort_keys=False,
).encode("utf-8")


def _ordered_dict(
source: Mapping[str, Any],
key_order: tuple[str, ...],
float_decimals: int,
) -> dict[str, Any]:
return {key: _round_floats(source[key], float_decimals) for key in key_order if key in source}


def _round_floats(value: Any, float_decimals: int) -> Any:
if isinstance(value, float):
return round(value, float_decimals)
if isinstance(value, list):
return [_round_floats(v, float_decimals) for v in value]
if isinstance(value, dict):
# Sort nested dict keys so producers with variable insertion
# order (e.g. raw_features populated conditionally by dedup
# and cluster-merge paths) still emit byte-identical JSON for
# the same logical payload.
return {k: _round_floats(value[k], float_decimals) for k in sorted(value)}
return value


def _json_default(obj: Any) -> Any:
if isinstance(obj, datetime):
iso = obj.isoformat()
# Pydantic emits "+00:00"; canonicalize to "Z".
return iso.replace("+00:00", "Z")
raise TypeError(f"cannot serialize {type(obj).__name__}")
55 changes: 55 additions & 0 deletions src/augur_format/augur_format/deterministic/markdown.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
"""Jinja2 Markdown renderer.

Templates live alongside this module at ``templates/``; one per
signal type plus a shared ``_base.md.j2``. The renderer is
deterministic given identical inputs and template files. The
templates are committed, so any rendering drift surfaces as a test
failure rather than silent variation.
"""

from __future__ import annotations

from pathlib import Path

from jinja2 import Environment, FileSystemLoader, select_autoescape

from augur_signals.models import SignalContext

_DEFAULT_TEMPLATE_DIR = Path(__file__).resolve().parent / "templates"


class MarkdownFormatter:
"""Render a SignalContext as Markdown via Jinja2."""

def __init__(self, template_dir: Path | None = None) -> None:
directory = template_dir or _DEFAULT_TEMPLATE_DIR
self._env = Environment(
loader=FileSystemLoader(str(directory)),
autoescape=select_autoescape(["html"]),
trim_blocks=True,
lstrip_blocks=True,
keep_trailing_newline=True,
)

def format(self, context: SignalContext, severity: str) -> str:
"""Render the per-signal-type template for *context*.

Raises jinja2.TemplateNotFound if the signal_type does not
have a dedicated template; a dedicated template exists for
every value in SignalType by construction, so missing
templates indicate a contract drift between enum and templates.
"""
template_name = f"{context.signal.signal_type.value}.md.j2"
template = self._env.get_template(template_name)
return template.render(
signal=context.signal,
market_question=context.market_question,
resolution_criteria=context.resolution_criteria,
resolution_source=context.resolution_source,
closes_at=context.closes_at,
related_markets=context.related_markets,
investigation_prompts=context.investigation_prompts,
interpretation_mode=context.interpretation_mode.value,
schema_version=context.schema_version,
severity=severity,
)
Loading
Loading