Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 109 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Structured market anomaly detection for prediction markets. Augur observes Polym

Augur is not a forecaster, an arbitrage engine, or a news writer. It is a deterministic structured-signal pipeline. See `docs/foundations/overview.md` for the full product framing and `docs/foundations/non-goals.md` for what Augur explicitly does not do.

Current version: **0.1.0**. Phase 1-5 scaffolding landed; runnable surfaces are the test suite, the labeling CLI, and the distributed-runtime smoke stack. See `docs/operations/manual-testing.md` for the end-to-end guide.

## Documentation

Authoritative documentation lives in `docs/`:
Expand All @@ -12,6 +14,7 @@ Authoritative documentation lives in `docs/`:
- `docs/contracts/` — schemas, versioning policy, consumer registry
- `docs/methodology/` — calibration, manipulation taxonomy, labeling protocol
- `docs/architecture/` — system design, polling spec, deduplication and storms, storage and scaling
- `docs/operations/` — distributed runbook, manual testing guide
- `docs/examples/` — worked positive and negative signal paths
- `docs/strategy/` — risk register, defensibility thesis

Expand All @@ -21,6 +24,7 @@ Start with `docs/README.md` for the documentation index.

- Python 3.12 or newer
- [uv](https://docs.astral.sh/uv/) 0.6 or newer for dependency management
- Optional: Docker + Docker Compose for the phase-5 smoke stack

## Local Development

Expand All @@ -34,6 +38,67 @@ uv run pytest # run the test suite with coverage

All three workspace packages — `augur-signals`, `augur-labels`, `augur-format` — are installed in editable mode by `uv sync`. Configuration lives under `config/`; data and label artifacts live under `data/` and `labels/`. Exported JSON schemas are committed to `schemas/` and kept in sync by `scripts/export_schemas.py`.

## Optional Dependency Groups

Each workspace package exposes extras for opt-in integrations. Install only what a deployment needs:

```bash
# LLM secondary formatter (phase 4)
uv sync --extra llm-local # augur-format[llm-local] — Ollama client
uv sync --extra llm-cloud # augur-format[llm-cloud] — Anthropic SDK

# Distributed runtime (phase 5)
uv sync --extra bus-nats # NATS JetStream adapter
uv sync --extra bus-redis # Redis Streams adapter
uv sync --extra storage-timescale # TimescaleDB via psycopg
uv sync --extra observability # Prometheus + OpenTelemetry
uv sync --extra distributed # all of the above
```

The dev dependency group in the repo root already pulls every extra so CI exercises every adapter against injected fakes.

## Runnable Surfaces

### Labeling CLI (phase 2)

```bash
uv run python scripts/label.py --help
uv run python scripts/label.py candidates
uv run python scripts/label.py decide <candidate-id>
```

### Worker entrypoints (phase 5)

```bash
uv run python -m augur_signals.workers # catalog
uv run python -m augur_signals.workers.poller --help # per-kind entrypoints
```

The `workers` package exposes bootstrap helpers (`augur_signals.workers.bootstrap`) that every `__main__` module uses for config loading, observability activation, and bus connection. Per-kind transform wiring for feature / detector / manipulation / calibration / dedup / context_format / llm requires a follow-up commit — see `docs/operations/manual-testing.md §3`.

### Migration scripts (phase 5)

```bash
uv run python scripts/migrate_to_timescale.py backfill --from labels/snapshots_archive
uv run python scripts/migrate_to_timescale.py verify --start 2026-01-01 --end 2026-04-01 --duckdb data/augur.duckdb
uv run python scripts/dual_write_sidecar.py --lag-alert-seconds 10
```

### Smoke stack (phase 5)

```bash
docker compose -f ops/docker/compose.yaml up -d # NATS + Redis + TimescaleDB + Prometheus
export AUGUR_CONFIG_DIR=$(pwd)/ops/docker/config
export AUGUR_TIMESCALE_URL=postgresql://augur:augur@localhost:5432/augur
```

### Container build

```bash
docker build -f ops/docker/Dockerfile -t augur:dev .
kubectl apply -k ops/deploy/ --dry-run=client -o yaml
```

## Quality Gates

The following commands must pass before any commit reaches `main`:
Expand All @@ -50,23 +115,61 @@ Coverage thresholds (80 % overall, 90 % new code, 95 % critical paths) follow `~

## Repository Layout

```
```text
augur/
├── pyproject.toml # uv workspace root
├── pyproject.toml # uv workspace root (v0.1.0)
├── uv.lock
├── config/ # TOML configuration
│ ├── bus.toml # phase 5 — message bus backend
│ ├── storage.toml # phase 5 — DuckDB / TimescaleDB selector
│ ├── observability.toml # phase 5 — Prometheus + OTel exporters
│ ├── llm.toml # phase 4 — gated LLM formatter
│ └── ... # polling, detectors, dedup, formatters, consumers, labeling, markets, forbidden_tokens
├── data/ # market taxonomy, investigation prompts, calibration state
├── labels/ # newsworthy-event labels (Parquet)
├── schemas/ # exported JSON schemas per Pydantic model
├── scripts/ # export_schemas, backtest, calibrate, label
├── scripts/
│ ├── backtest.py # stub
│ ├── calibrate.py # stub
│ ├── export_schemas.py
│ ├── label.py # labeling CLI wrapper
│ ├── lint_detector_now.py
│ ├── migrate_to_timescale.py # phase 5 backfill + verify
│ └── dual_write_sidecar.py # phase 5 tee replay
├── src/
│ ├── augur_signals/ # signal extraction core (no LLM imports — CI enforced)
│ ├── augur_labels/ # labeling pipeline
│ └── augur_format/ # deterministic and gated-LLM formatters
│ │ └── augur_signals/
│ │ ├── bus/ # EventBus protocol + NATS + Redis + distributed lock
│ │ ├── workers/ # harness, singleton runner, bootstrap, subject helpers
│ │ ├── storage/ # DuckDB + TimescaleDB adapters
│ │ └── ... # ingestion, features, detectors, manipulation, calibration, dedup, context
│ ├── augur_labels/ # labeling pipeline (phase 2)
│ └── augur_format/ # deterministic and gated-LLM formatters (phases 3 + 4)
├── tests/
└── ops/ # deployment and observability assets (populated later)
├── ops/
│ ├── docker/ # multi-stage Dockerfile + local compose smoke stack
│ │ ├── Dockerfile
│ │ ├── compose.yaml
│ │ ├── prometheus.yml
│ │ ├── otel-collector.yaml
│ │ └── config/ # smoke-specific bus/storage/observability TOMLs
│ └── deploy/ # Kubernetes manifests (Deployments, StatefulSets, HPA, Services)
└── .docs/ # phase specs and development plan
```

## Phase Status

| Phase | Scope | State |
| --- | --- | --- |
| 0 | Project workspace, CI scaffolding | Merged |
| 1 | Signal extraction core, detectors, calibration, dedup, context | Merged |
| 2 | Labeling pipeline + annotator CLI | Merged |
| 3 | Deterministic formatters (JSON, Markdown, WebSocket, Webhook) | Merged |
| 4 | Gated LLM secondary formatter | Merged |
| 5 | Distributed runtime scaffolding (bus, TimescaleDB, workers, ops) | Merged |

`CHANGELOG.md` records per-phase operational handoff notes. Release notes for v0.1.0 will aggregate these on tag.

## License

See `LICENSE`.
23 changes: 13 additions & 10 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,22 +32,25 @@ Read these before writing or modifying any code:
3. `methodology/calibration-methodology.md` — confidence pipeline
4. `methodology/labeling-protocol.md` — ground-truth definition
5. `methodology/manipulation-taxonomy.md` — manipulation signatures
6. `architecture/system-design.md` — layer-by-layer architecture
6. `architecture/system-design.md` — layer-by-layer architecture (includes Deployment Modes)
7. `architecture/adaptive-polling-spec.md` — polling state machine
8. `architecture/deduplication-and-storms.md` — signal merge algorithm
9. `architecture/storage-and-scaling.md` — storage architecture and migration triggers
10. `operations/distributed-runbook.md` — cutover, rollback, failover procedures
11. `operations/manual-testing.md` — runnable surfaces and local smoke stack

## Group Index

| Group | Purpose |
| ------------------- | ----------------------------------------------------- |
| `foundations/` | Project framing, scope, vocabulary, outward case |
| `contracts/` | Data schemas and registries that bind layers together |
| `methodology/` | Statistical, algorithmic, and process methodology |
| `architecture/` | System architecture, storage, polling, signal merging |
| `strategy/` | Risk register and defensibility analysis |
| `examples/` | Worked positive-path and negative-path examples |
| `open-questions.md` | Unresolved decisions with current best answers |
| Group | Purpose |
| ------------------- | ------------------------------------------------------------ |
| `foundations/` | Project framing, scope, vocabulary, outward case |
| `contracts/` | Data schemas and registries that bind layers together |
| `methodology/` | Statistical, algorithmic, and process methodology |
| `architecture/` | System architecture, storage, polling, signal merging |
| `operations/` | Distributed-runtime runbook and manual-testing guide |
| `strategy/` | Risk register and defensibility analysis |
| `examples/` | Worked positive-path and negative-path examples |
| `open-questions.md` | Unresolved decisions with current best answers |

## Conventions

Expand Down
63 changes: 42 additions & 21 deletions docs/architecture/system-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,20 @@ The diagram reflects the deterministic-context-primary architecture. The LLM for

```text
augur/
├── pyproject.toml
├── pyproject.toml # uv workspace root (v0.1.0)
├── README.md
├── config/
│ ├── default.toml
│ ├── markets.toml
│ ├── bus.toml # phase 5 — message bus backend selector
│ ├── storage.toml # phase 5 — DuckDB / TimescaleDB selector
│ ├── observability.toml # phase 5 — Prometheus + OTel exporters
│ ├── llm.toml # phase 4 — gated LLM formatter
│ ├── polling.toml
│ ├── detectors.toml
│ ├── dedup.toml
│ ├── formatters.toml
│ ├── consumers.toml
│ ├── labeling.toml
│ ├── markets.toml
│ └── forbidden_tokens.toml
├── data/
│ ├── markets/
Expand All @@ -90,26 +98,39 @@ augur/
│ └── newsworthy_events.parquet
├── src/
│ ├── augur_signals/
│ │ ├── models/ # MarketSnapshot, FeatureVector, MarketSignal, enums
│ │ ├── ingestion/ # Pollers, normalizer
│ │ ├── features/ # Rolling-window feature pipeline
│ │ ├── detectors/ # 5 detectors + base protocol
│ │ ├── manipulation/ # Signature catalog + evaluator
│ │ ├── calibration/ # FPR, BH-FDR, reliability curves, drift, FDR controller
│ │ ├── context/ # Deterministic context assembler
│ │ ├── storage/ # DuckDB persistence
│ │ ├── bus/ # Async event bus
│ │ ├── dedup/ # Signal dedup + storm handling
│ │ └── engine.py # Orchestrator
│ ├── augur_labels/ # Labeling pipeline
│ │ ├── models/ # MarketSnapshot, FeatureVector, MarketSignal, enums
│ │ ├── ingestion/ # Pollers, normalizer
│ │ ├── features/ # Rolling-window feature pipeline
│ │ ├── detectors/ # 5 detectors + base protocol
│ │ ├── manipulation/ # Signature catalog + evaluator
│ │ ├── calibration/ # FPR, BH-FDR, reliability curves, drift, FDR controller
│ │ ├── context/ # Deterministic context assembler
│ │ ├── storage/ # DuckDB + TimescaleDB adapters (phase 5)
│ │ ├── bus/ # EventBus protocol + NATS + Redis + distributed lock (phase 5)
│ │ ├── workers/ # Harness, singleton runner, bootstrap (phase 5)
│ │ ├── dedup/ # Signal dedup + storm handling
│ │ └── engine.py # Monolith orchestrator
│ ├── augur_labels/ # Labeling pipeline (phase 2)
│ └── augur_format/
│ ├── deterministic/ # JSON, Markdown templates
│ └── llm/ # Gated LLM formatter (Phase 4)
│ ├── deterministic/ # JSON, Markdown, webhook, websocket (phase 3)
│ └── llm/ # Gated LLM formatter (phase 4)
├── tests/
└── scripts/
├── backtest.py
├── calibrate.py
└── label.py
├── scripts/
│ ├── backtest.py # stub
│ ├── calibrate.py # stub
│ ├── export_schemas.py
│ ├── label.py
│ ├── lint_detector_now.py
│ ├── migrate_to_timescale.py # phase 5 — backfill + verify
│ └── dual_write_sidecar.py # phase 5 — tee replay
└── ops/
├── docker/ # Dockerfile + local smoke compose stack
│ ├── Dockerfile
│ ├── compose.yaml
│ ├── prometheus.yml
│ ├── otel-collector.yaml
│ └── config/ # smoke-specific bus/storage/observability TOMLs
└── deploy/ # Kubernetes manifests (Deployments, StatefulSets, HPA, Services)
```

The `src/augur_signals/` package contains zero LLM imports. CI enforces this via grep. The `src/augur_format/llm/` package is the only location where LLM code lives; it is gated behind `interpretation_mode = LLM_ASSISTED` and is opt-in per consumer.
Expand Down
Loading
Loading