Skip to content

ksk5429/quant

Repository files navigation

K-Fish

Swarm Intelligence Prediction Engine for Prediction Markets

Brier Accuracy Cost Fish v5 Lit

Python Claude Polymarket netcal MLflow License

Named after the schooling behavior of fish — individually simple, collectively intelligent.


K-Fish deploys 9 LLM agents ("Fish") that each use a structurally different reasoning framework to analyze prediction markets. Their independent probability estimates are fused through a multi-round Delphi protocol, calibrated with machine learning, and converted into risk-controlled positions using the Kelly criterion. The entire system runs at zero cost via Claude Code CLI.


How It Works

flowchart TD
    A["`**MARKET QUESTION**
    price withheld from all Fish agents`"]

    B["`**SWARM ROUTER**
    classifies category
    selects personas, rounds, extremization`"]

    NEWS["`**NEWS RETRIEVAL** ★ new
    trafilatura scrapes top articles
    sentence-transformers ranks by relevance
    top 3 injected into Fish prompts`"]

    C["`**RESEARCHER FISH**
    base rates · key facts
    timing · contrarian case · news context`"]

    subgraph DELPHI [" MULTI-ROUND DELPHI PROTOCOL "]
        direction TB
        R1["`**Round 1** — Independent`"]
        R2["`**Round 2** — Peer Context`"]
        RN["`**Round N** — Converge`"]
        R1 -- "anonymized estimates" --> R2 -- "update or hold" --> RN
    end

    D["`**AGGREGATION**
    trimmed mean · confidence-weighted
    asymmetric extremization`"]

    E1["`**CALIBRATE**
    netcal auto-select
    Beta · Histogram · Isotonic`"]

    BIAS["`**AI BIAS DETECTION** ★ new
    5-layer RLHF decompressor
    compression → decompress
    knowledge gap → follow crowd`"]

    E2["`**VOLATILITY**
    GARCH regime detection
    Kelly adjustment factor`"]

    F["`**EDGE DETECTION**
    empirically optimized threshold
    confidence > 40% · spread < 35%`"]

    G["`**KELLY SIZING**
    quarter-Kelly · 5% max/position
    30% max exposure · 15% drawdown stop`"]

    H["`**POSITION**
    side YES/NO · size $
    expected value · reasoning chain`"]

    A --> B --> NEWS --> C --> DELPHI --> D
    D --> E1 & BIAS & E2
    E1 & BIAS & E2 --> F
    F --> G --> H
Loading

Example Output

Market: Will the Fed increase interest rates by 25+ bps after the March 2026 meeting?

flowchart TD
    Q["`**Market Question**
    Will the Fed increase interest rates
    by 25+ bps after the March 2026 meeting?`"]

    Q --> FISH

    subgraph FISH [" 9 Fish Predict Independently "]
        direction LR
        F1["🎯 Anchor — **0.20**"]
        F2["🔀 Decomp — **0.18**"]
        F3["🔍 Inside — **0.15**"]
        F4["⚡ Contra — **0.28**"]
        F5["⏱️ Tempo — **0.22**"]
        F6["🏛️ Instit — **0.17**"]
        F7["💀 Premrt — **0.25**"]
        F8["📐 Calibr — **0.19**"]
        F9["📊 Bayes — **0.16**"]
    end

    FISH --> AGG["`**Aggregation**
    trimmed mean = 0.200`"]
    AGG --> EXT["`**Extremization**
    0.200 → 0.178`"]
    EXT --> CAL["`**Calibration**
    0.178 → 0.165`"]
    CAL --> EDGE{"`**Edge Check**
    |0.165 − 0.22| = 5.5%
    threshold = 7%`"}
    EDGE -->|"edge too small"| SKIP["`**NO POSITION**
    edge below threshold`"]

    ACTUAL["`**Actual Outcome**
    Fed did not raise rates
    K-Fish correct ✓ Brier = 0.027`"]

    SKIP ~~~ ACTUAL
Loading

Fish Personas

9 Fish — each with orthogonal reasoning (click to expand)

Each persona encodes a structurally different decomposition strategy to maximize ensemble diversity (Schoenegger et al., Science Advances 2024).

# Fish Reasoning Framework Function
1 Base Rate Anchor Reference class frequency Anchors on historical base rates, adjusts minimally
2 Decomposer Sub-probability multiplication Breaks question into independent conditional sub-questions
3 Inside View Domain-specific evidence Finds the single most informative fact others miss
4 Contrarian Consensus stress-testing Constructs the strongest case for the less popular outcome
5 Temporal Analyst Timing and momentum Deadline analysis, hazard rates, trajectory
6 Institutional Analyst Organizational incentives Status quo bias, decision-maker constraints
7 Premortem Failure scenario enumeration Imagines why the expected outcome failed
8 Calibrator Tetlock superforecaster protocol Base rate → evidence → incremental update → bias check
9 Bayesian Updater Explicit prior x likelihood States prior, identifies evidence, applies Bayes' rule

Retrodiction Baseline

200 resolved Polymarket markets · 9 Fish · Claude Haiku CLI · $0 cost · 7.7 hours runtime

Metric K-Fish v5 (N=200) K-Fish v4 (N=30) Random
Brier Score 0.206 0.213 0.250
Accuracy 69.0% 73.3% 50%
ECE 0.140 0.178 0.250
BSS vs Random +17.6% +14.8% 0%
Cost $0.00 $0.00

Note

BSS (Brier Skill Score) = +17.6% means K-Fish predictions are 17.6% more accurate than random guessing. The system does not yet beat the Polymarket crowd aggregate (Brier ~0.084), which incorporates information from thousands of traders including whales and insiders. The gap is primarily driven by surprise events beyond the LLM training data cutoff.

Per-Fish Performance Rankings (N=200)
Rank Persona Brier Assessment
1 Contrarian 0.199 Best at N=200 — consensus stress-testing adds real value
2 Inside View 0.206 Domain expertise remains strong
3 Premortem 0.207 Improved with more data (was worst at N=30)
4 Calibrator 0.209 Tetlock method is consistently reliable
5 Decomposer 0.211 Conditional decomposition adds moderate value
6 Bayesian 0.213 Explicit prior/likelihood reasoning
7 Temporal 0.217 Timing analysis improved with larger sample
8 Institutional 0.224 Status quo analysis less useful than expected
9 Base Rate 0.226 Anchoring too heavily on base rates hurts on novel events

What Makes K-Fish Different

ai-hedge-fund (43K stars) PolySwarm K-Fish
Target Equities Prediction markets Prediction markets
Agents 18 (investor personas) 50 (diverse personas) 9 (orthogonal reasoning)
Calibration None Confidence-weighted netcal auto-select (Beta/Histogram/Isotonic)
Multi-round No No Yes (Delphi with convergence)
Pre-screen No No Yes (3-Fish filter for unknowable markets)
Cost API calls ($) API calls ($) $0.00 (CLI mode)
Risk mgmt Position limits Quarter-Kelly Quarter-Kelly + GARCH volatility + drawdown circuit breaker
Validated Backtest only Backtest only 200-market retrodiction on resolved markets
Persistence None None SQLite (survives restart)
Paper trading No No Yes (full daemon loop)
AI bias exploit No No RLHF hedging decompressor + cross-market arbitrage

AI Bias Exploitation

Note

Novel contribution: instead of only predicting events, K-Fish detects where AI traders are systematically wrong and exploits the bias.

flowchart TD
    INPUT["`**Fish Predictions**
    9 probabilities + reasoning text`"]

    INPUT --> L1

    subgraph DETECT [" 5-Layer Bias Detection "]
        direction TB
        L1["`**Layer 1 — Reasoning Coherence**
        Does the text contradict the number?
        Directional keywords vs stated probability`"]

        L2["`**Layer 2 — Distribution Shape**
        Is the swarm split (bimodal) or clustered?
        Split = disagreement, not uncertainty`"]

        L3["`**Layer 3 — Knowledge Cutoff**
        Do Fish reference training data limits?
        30%+ cutoff mentions = knowledge gap`"]

        L4["`**Layer 4 — Confidence Paradox**
        High confidence + neutral probability?
        Confident about 0.50 = RLHF artifact`"]

        L5["`**Layer 5 — Self-Calibration**
        Track per-regime Brier scores
        Learn which actions actually work`"]

        L1 --> L2 --> L3 --> L4 --> L5
    end

    L5 --> R{"`**Regime
    Classification**`"}

    R -->|"R-P gap > 0.15
    reasoning contradicts number"| D["`**RLHF Compression**
    Decompress: 0.51 → 0.65
    Trade on decompressed probability`"]

    R -->|"30%+ Fish reference cutoff
    post-training event"| C["`**Knowledge Gap**
    Blend with crowd price
    They have info we lack`"]

    R -->|"Low R-P gap, low confidence
    both AI and crowd near 0.50"| S["`**Genuine Uncertainty**
    Skip — no edge exists
    Save compute for better markets`"]
Loading
How the decompressor works

When RLHF compression is detected (Fish reasoning says "strong evidence for YES" but probability is 0.52), the decompressor estimates the pre-hedging probability:

Signal Value
Fish stated probability 0.52
Reasoning direction score +0.87 (strong YES)
Implied probability 0.85
Confidence weight 0.60
Decompressed 0.655
Market crowd price 0.72

The decompressed probability (0.655) is much closer to the crowd truth (0.72) than the raw output (0.52). The RLHF penalty was hiding 15 percentage points of directional signal.

Cross-market arbitrage

Detects logically inconsistent prices across related markets:

Type Example Detection
Subset violation P("GPT-6 released") > P("OpenAI releases model") Buy NO on subset, YES on superset
Complement violation P(A) + P(not A) != 1.0 Arbitrage the gap
Spread mispricing Correlated markets with excessive price spread Hedged pair trade

Constructs hedged pair positions with full 4-scenario P&L analysis.

v5 Production Features

Important

v5 transforms K-Fish from a research prototype into a production trading system with persistence, execution, monitoring, and safety controls.

graph TD
    subgraph PERSIST ["💾 Persistence (Phase 1)"]
        DB["SQLite database\npredictions · positions · calibration\nresolutions · system state"]
    end

    subgraph RETRO ["📊 Statistical Validity (Phase 2)"]
        R200["200-market retrodiction\nBrier 0.206 · BSS +17.6%\nbootstrap CIs · per-category breakdown"]
    end

    subgraph EXEC ["⚡ Live Execution (Phase 3)"]
        EX["Polymarket CLOB executor\n5 safety checks · paper default\nposition manager · reconciliation"]
    end

    subgraph TEST ["🧪 Test Coverage (Phase 4)"]
        T120["120 tests passing\nHypothesis property-based\nunit + integration"]
    end

    subgraph MON ["📈 Monitoring (Phase 5)"]
        DASH["Track record dashboard\nJSONL alerting\ngraceful degradation"]
    end

    subgraph PAPER ["📋 Paper Trading (Phase 6)"]
        PT["Daemon loop (6h cycles)\ndaily/weekly reports\ngo/no-go checklist"]
    end

    PERSIST --> RETRO --> EXEC --> TEST --> MON --> PAPER
Loading
6 Safety Rules (non-negotiable)
Rule Enforcement
Paper trading is default paper_trading=True in all constructors. --live flag + typed confirmation required.
Private keys never in code Environment variables only. .env is gitignored.
Position limits are hard caps Enforced at executor level: max $50/position, max $300 exposure.
Drawdown halt is automatic Trading stops at -15%. Persisted to DB. Manual --reset-drawdown to resume.
Reconciliation runs daily DB vs on-chain check via --reconcile flag.
Gradual escalation Week 1-2: $25/pos. Week 3-4: $50/pos. Month 2+: evaluate.

Zero-Cost Architecture

K-Fish runs entirely on the Claude Code CLI (claude -p), which uses the Max subscription at no additional API cost.

Backend Cost Speed Automated GPU
CLI (claude -p) $0.00 ~15s/Fish Yes No
Ollama (local) $0.00 ~5s/Fish Yes Yes
Gemini (free tier) $0.00 ~3s/Fish Yes No
File (manual) $0.00 Manual No No

Quick Start

# Clone and install
git clone https://github.com/ksk5429/quant.git && cd quant
pip install -e ".[dev]"
pip install netcal scoringrules quantstats trafilatura statsforecast mlflow sentence-transformers
# Scan live Polymarket markets
python -m src.markets.scanner --min-volume 100000

# Start paper trading daemon (6-hour cycles, $0 cost)
bash scripts/start_paper_trading.sh

# Daily performance check
bash scripts/daily_report.sh

# Weekly statistical review with go/no-go checklist
bash scripts/weekly_review.sh
Expected scanner output
K-FISH MARKET SCAN — 2026-04-12 22:39
Active markets scanned: 50

Rank Score        Cat  Price       Vol($) Question
   1  3.37    general   48%  10,976,062  Will Jesus Christ return before GTA VI?
   2  3.32   politics   27%  22,267,744  Will Gavin Newsom win the 2028 Democratic...
   3  3.32   politics   42%  11,871,967  Will J.D. Vance win the 2028 Republican...
   4  3.30 geopolitics   38%  13,192,103  Iran x Israel/US conflict ends by April 7?
   5  3.26 geopolitics   30%  14,068,338  Russia x Ukraine ceasefire by end of 2026?
# Run retrodiction (evaluate on 30 resolved markets)
python -m src.prediction.run_retrodiction --n 30 --model haiku --concurrent 3

# Run full live pipeline (scan → analyze → portfolio)
python -m src.mirofish.live_pipeline --top 10 --model haiku

Architecture

Project Structure (click to expand)
src/
├── mirofish/                   # Swarm Engine
│   ├── engine_v4.py           #   Canonical pipeline with DB integration
│   ├── llm_fish.py            #   9 personas, 4 backends, asymmetric extremization
│   ├── researcher.py          #   Context gathering Fish
│   ├── swarm_router.py        #   Category routing + model competition
│   ├── news_context.py        #   ★ Real-time news retrieval + semantic ranking
│   ├── live_pipeline.py       #   Scanner → Engine → Portfolio → Report
│   └── ipc.py                 #   File-based IPC for distributed Fish
├── prediction/                 # Scoring & Calibration
│   ├── calibration.py         #   netcal v2: Beta/Histogram/auto-select + CRPS
│   ├── ai_bias_detector.py    #   ★ 5-layer RLHF hedging detector + decompressor
│   ├── advanced_scoring.py    #   Brier decomposition, bootstrap CI, BSS
│   ├── retrodiction_pipeline.py # ★ Expansion pipeline with parquet output
│   ├── batch_retrodiction.py  #   Batch evaluation with DB persistence
│   ├── volatility.py          #   GARCH regime detection
│   └── run_retrodiction.py    #   CLI-based evaluation runner
├── execution/                  #  v5  Live Trading
│   ├── polymarket_executor.py #   py-clob-client wrapper, 5 safety checks
│   ├── position_manager.py    #   Execute, resolve, reconcile positions
│   ├── live_loop.py           #   Production daemon (6h cycles)
│   └── order_types.py         #   OrderResult, ClosedPosition
├── db/                         #  v5  Persistence
│   ├── schema.sql             #   5 tables: predictions, positions, calibration, etc.
│   └── manager.py             #   DatabaseManager with context manager
├── reporting/                  #  v5  Monitoring
│   ├── dashboard.py           #   Markdown track record generator
│   └── alerts.py              #   JSONL event alerting
├── risk/                       # Position Sizing
│   ├── portfolio.py           #   Edge detection, Kelly, drawdown monitor
│   ├── arbitrage.py           #   ★ Cross-market arbitrage + hedged pair trades
│   ├── threshold_optimizer.py #   ★ Data-driven edge threshold from retrodiction
│   └── analytics.py           #   Sharpe/Sortino, Monte Carlo simulation
├── markets/                    # Market Data
│   ├── polymarket.py          #   Gamma + CLOB API clients
│   ├── scanner.py             #   Live market discovery + ranking
│   ├── history.py             #   Resolved market scraper (2,500 markets)
│   └── dataset.py             #   408K market parquet loader (DuckDB)
├── semantic/                   # NLP
│   └── news_extractor.py      #   trafilatura + sentence-transformers
└── utils/                      # Infrastructure
    ├── cli.py                 #   Claude binary detection
    ├── experiment_tracker.py  #   MLflow tracking
    └── config.py              #   YAML config loader
Module Architecture (click to expand)
block-beta
    columns 1

    block:ENGINE["🔷 ENGINE LAYER"]
        columns 3
        LP["live_pipeline"] E4["engine_v4"] SR["swarm_router"]
    end

    space

    block:SWARM["🐟 SWARM LAYER"]
        columns 3
        LF["llm_fish\n9 personas\n4 backends"] RS["researcher\ncontext gathering"] IPC["ipc\ndistributed Fish"]
    end

    space

    block:PRED["📊 PREDICTION LAYER"]
        columns 4
        CAL["calibration\nnetcal v2"] ADV["advanced_scoring\nBrier · CRPS"] VOL["volatility\nGARCH"] RET["run_retrodiction\nCLI evaluation"]
    end

    space

    block:RISK["🛡️ RISK LAYER"]
        columns 2
        PF["portfolio\nKelly · edge · drawdown"] AN["analytics\nSharpe · Monte Carlo"]
    end

    space

    block:MKT["🌐 MARKET LAYER"]
        columns 4
        SC["scanner\nlive discovery"] PM["polymarket\nGamma + CLOB"] HI["history\n2500 resolved"] DS["dataset\n408K parquet"]
    end

    ENGINE --> SWARM --> PRED --> RISK --> MKT
Loading

Key Design Decisions (click to expand)

[!IMPORTANT] Every decision is grounded in peer-reviewed evidence or empirical retrodiction results.

Decision Why Evidence
9 orthogonal personas Structural reasoning diversity drives accuracy Schoenegger et al., Science Advances 2024
Prices withheld from Fish Prevents anchoring, preserves independence PolySwarm, arXiv:2604.03888
Asymmetric extremization Suppress when Fish disagree (high spread) Retrodiction: 5 worst markets had high spread
3-Fish pre-screen Skip unknowable markets LLM cutoff caused Brier 0.95+ on surprises
Quarter-Kelly Full Kelly has ~25% drawdowns Kelly 1956
Auto-seeded calibrator No uncalibrated cold start Code review: calibration was always a no-op
CLI over API $0 vs $3-15/M tokens Maximize predictions per dollar
Real-time news injection Fish reason about post-cutoff events Retrodiction: 5 worst misses were all post-cutoff surprises
Data-driven edge threshold Empirically optimal from 230+ retrodictions Threshold optimizer sweeps 0-30%, finds where Kelly returns turn positive
RLHF decompression Extract true signal hidden by hedging bias 5-layer detector: reasoning-probability gap identifies compression
Libraries (click to expand)
Library Purpose Why This One
netcal Probability calibration 10+ methods, auto-select by sample size
scoringrules CRPS, Brier, log score JAX/Numba backends
quantstats Portfolio analytics Sharpe, Sortino, Calmar, Monte Carlo
trafilatura News extraction 0.958 F1, used by HuggingFace/IBM
sentence-transformers Semantic embeddings Market similarity, news matching
statsforecast GARCH volatility 20x faster than pmdarima
MLflow Experiment tracking Model registry for calibrators
mapie Conformal prediction Coverage-guaranteed intervals

Project Stats

Metric Value
Python source lines ~18,000
Source modules 40
Unit tests passing 120
Code reviews completed 5
Bugs found and fixed 42
Retrodiction markets 230+ (expanding)
Resolved market corpus 5,000
External dataset 408,863 markets
Libraries integrated 11

Research Foundation

Tip

Full literature review with 45 references: Literature Review

Claim Evidence
LLM ensembles match human crowds Schoenegger et al., Science Advances 2024
Retrieval-augmented LLMs approach superforecasters Halawi et al., NeurIPS 2024
50-persona swarm outperforms single-model on Polymarket PolySwarm, arXiv:2604.03888
RLHF models are overconfident, need calibration Geng et al., NAACL 2024
Semantic similarity outperforms price correlation Baaijens et al., Applied Network Science 2025

Roadmap

graph TD
    P1["✅ <b>Phase 1</b> — Foundation<br/>Core engine · Literature review · Polymarket API"]
    P2["✅ <b>Phase 2</b> — Swarm Intelligence<br/>9 Fish personas · Multi-round Delphi · CLI execution"]
    P3["✅ <b>Phase 3</b> — Calibration<br/>netcal integration · Retrodiction baseline"]
    P4["✅ <b>Phase 4</b> — Risk Management<br/>Kelly sizing · Edge detection · Drawdown monitor"]
    P5["✅ <b>Phase 5</b> — v5 Persistence<br/>SQLite DB · 200-market retrodiction · Brier 0.206"]
    P6["✅ <b>Phase 6</b> — v5 Execution<br/>CLOB executor · Position manager · 120 tests · Dashboard"]
    P7["🔄 <b>Phase 7</b> — Paper Trading<br/>2-4 weeks validation · Go/no-go checklist"]
    P8["⬜ <b>Phase 8</b> — Live Trading<br/>Real capital · Gradual escalation · GRPO fine-tuning"]

    P1 --> P2 --> P3 --> P4 --> P5 --> P6 --> P7 --> P8

    style P1 fill:#0a2910,stroke:#3fb950,color:#3fb950
    style P2 fill:#0a2910,stroke:#3fb950,color:#3fb950
    style P3 fill:#0a2910,stroke:#3fb950,color:#3fb950
    style P4 fill:#0a2910,stroke:#3fb950,color:#3fb950
    style P5 fill:#0a2910,stroke:#3fb950,color:#3fb950
    style P6 fill:#0a2910,stroke:#3fb950,color:#3fb950
    style P7 fill:#1a1a2e,stroke:#58a6ff,color:#58a6ff
    style P8 fill:#1a1a2e,stroke:#8b949e,color:#8b949e
Loading

Built with structured human-AI collaboration · Paper trading is the default · Live trading requires explicit human approval

About

K-Fish: Swarm Intelligence Prediction Engine for Prediction Markets. 9 LLM agents, multi-round Delphi, RLHF bias decompressor, cross-market arbitrage, Kelly sizing. Zero API cost.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors