Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -214,3 +214,5 @@ __marimo__/

# Internal working docs
.docs/
# CocoIndex Code (ccc)
/.cocoindex_code/
34 changes: 22 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Augur

Structured market anomaly detection for prediction markets. Augur observes Polymarket and Kalshi with adaptive polling, extracts typed signals with calibrated confidence, and attaches investigation prompts drawn from a frozen library. The canonical consumer interface is a JSON schema; deterministic Markdown and a gated, opt-in LLM formatter are built on top of it.
Augur extracts structured intelligence signals from prediction markets. It observes Polymarket and Kalshi markets, measures consensus velocity, volume, liquidity, order-book pressure, and cross-market divergence, then emits typed events for downstream agents and analysts to investigate. The canonical consumer interface is a JSON schema; deterministic Markdown and a gated, opt-in LLM formatter are secondary renderings built on top of it.

Augur is not a forecaster, an arbitrage engine, or a news writer. It is a deterministic structured-signal pipeline. See `docs/foundations/overview.md` for the full product framing and `docs/foundations/non-goals.md` for what Augur explicitly does not do.

Current version: **0.1.0**. Phase 1-5 scaffolding landed; runnable surfaces are the test suite, the labeling CLI, and the distributed-runtime smoke stack. See `docs/operations/manual-testing.md` for the end-to-end guide.
Current version: **0.1.0**. The component implementation is substantial, but the live proof loop is not complete. Runnable surfaces are the test suite, the labeling CLI, the single-process engine runner, and the distributed-runtime smoke stack. An active watchlist, backtest runner, calibration runner, and real consumer feed remain follow-up work. See `docs/operations/manual-testing.md` for the current manual testing guide.

## Documentation

Expand Down Expand Up @@ -43,11 +43,11 @@ All three workspace packages — `augur-signals`, `augur-labels`, `augur-format`
Each workspace package exposes extras for opt-in integrations. Install only what a deployment needs:

```bash
# LLM secondary formatter (phase 4)
# LLM secondary formatter
uv sync --extra llm-local # augur-format[llm-local] — Ollama client
uv sync --extra llm-cloud # augur-format[llm-cloud] — Anthropic SDK

# Distributed runtime (phase 5)
# Distributed runtime
uv sync --extra bus-nats # NATS JetStream adapter
uv sync --extra bus-redis # Redis Streams adapter
uv sync --extra storage-timescale # TimescaleDB via psycopg
Expand All @@ -59,24 +59,33 @@ The dev dependency group in the repo root already pulls every extra so CI exerci

## Runnable Surfaces

### Labeling CLI (phase 2)
### Labeling CLI

```bash
uv run python scripts/label.py --help
uv run python scripts/label.py candidates
uv run python scripts/label.py decide <candidate-id>
```

### Worker entrypoints (phase 5)
### Single-process engine runner

```bash
uv run python scripts/run_engine.py --help
uv run python scripts/run_engine.py --once
```

`scripts/run_engine.py` loads `AUGUR_CONFIG_DIR` or `config/`, opens the DuckDB store, runs the existing in-process extraction engine, and writes canonical `SignalContext` JSON to stdout. It fails fast when `config/markets.toml` has no active markets. Active Kalshi markets require `KALSHI_API_KEY`; Polymarket-only watchlists do not.

### Worker entrypoints

```bash
uv run python -m augur_signals.workers # catalog
uv run python -m augur_signals.workers.poller --help # per-kind entrypoints
```

The `workers` package exposes bootstrap helpers (`augur_signals.workers.bootstrap`) that every `__main__` module uses for config loading, observability activation, and bus connection. Per-kind transform wiring for feature / detector / manipulation / calibration / dedup / context_format / llm requires a follow-up commit — see `docs/operations/manual-testing.md §3`.
The `workers` package exposes bootstrap helpers (`augur_signals.workers.bootstrap`) that every `__main__` module uses for config loading, observability activation, and bus connection. Per-kind transform wiring for feature / detector / manipulation / calibration / dedup / context_format / llm requires a follow-up commit. See `docs/operations/manual-testing.md §4`.

### Migration scripts (phase 5)
### Migration scripts

```bash
uv run python scripts/migrate_to_timescale.py backfill --from labels/snapshots_archive
Expand Down Expand Up @@ -120,10 +129,10 @@ augur/
├── pyproject.toml # uv workspace root (v0.1.0)
├── uv.lock
├── config/ # TOML configuration
│ ├── bus.toml # phase 5 — message bus backend
│ ├── storage.toml # phase 5 — DuckDB / TimescaleDB selector
│ ├── observability.toml # phase 5 — Prometheus + OTel exporters
│ ├── llm.toml # phase 4 — gated LLM formatter
│ ├── bus.toml # message bus backend
│ ├── storage.toml # DuckDB / TimescaleDB selector
│ ├── observability.toml # Prometheus + OTel exporters
│ ├── llm.toml # gated LLM formatter
│ └── ... # polling, detectors, dedup, formatters, consumers, labeling, markets, forbidden_tokens
├── data/ # market taxonomy, investigation prompts, calibration state
├── labels/ # newsworthy-event labels (Parquet)
Expand All @@ -134,6 +143,7 @@ augur/
│ ├── export_schemas.py
│ ├── label.py # labeling CLI wrapper
│ ├── lint_detector_now.py
│ ├── run_engine.py # single-process live runner
│ ├── migrate_to_timescale.py # phase 5 backfill + verify
│ └── dual_write_sidecar.py # phase 5 tee replay
├── src/
Expand Down
24 changes: 12 additions & 12 deletions config/detectors.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,28 +8,28 @@
# tune against the historical corpus before live deployment.

[price_velocity]
hazard = 0.004 # 1/250
fire_probability_threshold = 0.7
resolution_exclusion_hours = 6
hazard_rate = 0.004 # 1/250
fire_threshold = 0.7
resolution_exclusion_seconds = 21600

[volume_spike]
ewma_alpha = 0.05
target_fdr_q = 0.05

[book_imbalance]
top_levels = 5
depth_levels = 5
bullish_threshold = 0.72
bearish_threshold = 0.28
persist_snapshots = 3
min_total_depth_usd = 5000.0
persistence_snapshots = 3
minimum_total_depth = 5000.0

[cross_market_divergence]
window_hours = 4
[cross_market]
window_seconds = 14400
min_historical_correlation = 0.6
activity_floor_volume_ratio = 1.0
activity_floor = 1.0

[regime_shift]
target_alpha = 0.02
k_sigma = 0.5
h_sigma = 4.0
min_dormancy_hours = 6
k_multiplier = 0.5
h_multiplier = 4.0
dormancy_minimum_seconds = 21600
44 changes: 35 additions & 9 deletions docs/operations/manual-testing.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Manual Testing Guide

Augur has three runnable surfaces today: the test suite, the labeling CLI, and the distributed-runtime smoke stack. This document enumerates what can be exercised locally and what remains operator-wiring work.
Augur has four runnable surfaces today: the test suite, the labeling CLI, the single-process engine runner, and the distributed-runtime smoke stack. This document enumerates what can be exercised locally and what remains operator-wiring work.

## 1. Quality gates and tests

Expand Down Expand Up @@ -40,7 +40,33 @@ uv run python scripts/label.py coverage # per-category coverage

State persists to `labels/queue.json` and promoted rows land as partitioned Parquet under `labels/newsworthy_events/date=YYYY-MM-DD/`.

## 3. Distributed-runtime smoke stack
## 3. Single-process engine runner

The monolith runner drives the existing in-process engine against configured active markets and writes canonical `SignalContext` JSON lines to stdout.

```bash
uv run python scripts/run_engine.py --help
uv run python scripts/run_engine.py --once
```

Runtime contract:

- `AUGUR_CONFIG_DIR` overrides the default `config/` directory.
- `config/markets.toml` must contain at least one active market.
- Polymarket-only watchlists run without platform credentials.
- Active Kalshi markets require `KALSHI_API_KEY`.
- DuckDB storage is opened from `config/storage.toml`.
- Output is deterministic canonical JSON from `augur_format.deterministic.json_feed`.

Current repository state still has only an inactive placeholder watchlist, so `uv run python scripts/run_engine.py --once` fails fast with:

```text
run_engine failed: config/markets.toml has no active markets
```

Populate `config/markets.toml` before using the runner for live capture.

## 4. Distributed-runtime smoke stack

The phase 5 compose stack brings up every external dependency the workers need: NATS JetStream, Redis, TimescaleDB, Prometheus, and (optionally) an OTel collector. Workers run as separate host processes so each one is inspectable.

Expand Down Expand Up @@ -128,7 +154,7 @@ bus = build_event_bus(cfg.bus) # nats or redis
await bus.connect()
```

## 4. Migration scripts
## 5. Migration scripts

Both scripts are fully runnable against the smoke stack once TimescaleDB is initialized.

Expand Down Expand Up @@ -162,7 +188,7 @@ uv run python scripts/dual_write_sidecar.py \

Requires the engine to publish to `augur.writes` — this path is not wired in the monolith yet, so the sidecar is smoke-testable against handcrafted fixtures for now.

## 5. Container build and Kubernetes
## 6. Container build and Kubernetes

### Build the image

Expand Down Expand Up @@ -191,24 +217,24 @@ kubectl -n augur create secret generic augur-secrets \
--dry-run=client -o yaml | kubectl apply -f -
```

## 6. Observability
## 7. Observability

- Prometheus: `http://localhost:9090` after compose is up. Scrapes `host.docker.internal:9091..9097`.
- NATS admin: `http://localhost:8222/varz`.
- Redis CLI: `redis-cli -h localhost ping`.
- TimescaleDB: `psql $AUGUR_TIMESCALE_URL -c 'select * from timescaledb_information.hypertables'`.
- OTel collector: spans print to the container stdout (`docker compose logs otel-collector`).

## 7. Tear down
## 8. Tear down

```bash
docker compose -f ops/docker/compose.yaml down -v
unset AUGUR_CONFIG_DIR AUGUR_TIMESCALE_URL AUGUR_REPLICA_ID
```

## 8. Known gaps
## 9. Known gaps

- No end-to-end monolith launcher (`python -m augur_signals.engine` has no `__main__`). The engine is driveable from Python and from `tests/signals/test_engine_integration.py`, but no production script.
- The monolith runner exists as `scripts/run_engine.py`, but the checked-in watchlist still has no active markets.
- `scripts/backtest.py` and `scripts/calibrate.py` are stubs that raise `NotImplementedError`.
- Worker entrypoints for feature / detector / manipulation / calibration / dedup / context_format / llm require the bus message-schema work described in §3 above.
- Worker entrypoints for feature / detector / manipulation / calibration / dedup / context_format / llm require the bus message-schema work described in §4 above.
- Live failover tests against a real NATS or Redis cluster are operator-owned; CI uses dependency-injected fakes.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[project]
name = "augur"
version = "0.1.0"
description = "Structured market anomaly detection for prediction markets"
description = "Structured intelligence signals from prediction markets"
readme = "README.md"
requires-python = ">=3.12"
license = { file = "LICENSE" }
Expand Down
Loading
Loading