Skip to content

rscolling/adv-lens

Repository files navigation

ADV-Lens

Form ADV Part 2A intelligence + peer benchmarking. A LangGraph agent that ingests any RIA's Form ADV Part 2A brochure and produces a compliance-and-competitive scorecard: fee-structure benchmarking vs peer advisers, disciplinary disclosure flags, conflict-of-interest enumeration, and a redline against SEC plain-English expectations.

Status: Pipeline feature-complete end-to-end on real SEC filings, plus a server-rendered review UI at /review covering both the benchmark-a-filed-firm and the score-your-own-draft use cases. First live IAPD run landed against Brown Advisory LLC (CRD 110181) on 2026-04-26, sample at docs/examples/sample-report.json (HTML / PDF). Eval harness at 17/19 pass, mean F1 0.921 on the 19-fixture golden set after the Day-14g scorer fixes. The full operator reference is in docs/user-manual.pdf (26 pages, includes the Brown Advisory Item 5 + Item 9 + IAPD-search screenshots).


The problem

RIA Chief Compliance Officers spend ~40 hours a year reading peer brochures to defend their own annual ADV review. M&A diligence teams at RIA aggregators do the same work on every target firm. Both are paralegal-grade reads that should be machine-assisted.

Neither audience wants a chatbot. They want a structured scorecard they can defend on exam or in a deal memo.

Who this is for

  1. A hiring manager at F2 Strategy or a peer consultancy evaluating whether Robert Colling can ship production AI for wealth management.
  2. An RIA Chief Compliance Officer who wants to trust the outputs, inspect the Langfuse trace, and re-run the eval harness quarterly.
  3. A senior engineer evaluating the code for hire. Architecture, eval discipline, structured-output contracts, HITL design.

Sample output

The first live IAPD run produced this CCO-readable redline for Brown Advisory LLC (CRD 110181):

Brown Advisory redline page 1

Full artifact: HTML · PDF · JSON.

Reviewer UI

A thin server-rendered review surface ships with the app at /review. It lists pipeline runs, opens each one as a side-by-side redline + decision form, and writes the same human_reviews row the JSON POST /report/decision would — so the audit semantics carry through unchanged.

ADV-Lens dashboard — list view with intake forms and runs table

docker compose up -d postgres qdrant
uv run python -m adv_lens.app.web.seed   # one-shot: load Brown Advisory samples (filed + draft)
uv run uvicorn adv_lens.app.main:app --reload
# → http://localhost:8000/review

Click any row to open the detail page — the redline iframe on the left, the reviewer decision form on the right, decision history below it. Submitting a decision posts via HTMX and swaps the decisions panel in place without a full-page reload:

ADV-Lens detail view — redline + decision form + audit history

The dashboard supports both audience-facing use cases:

  1. Filed brochure (benchmark / diligence). Enter a firm's CRD; the pipeline fetches the brochure from SEC IAPD, runs end-to-end, and the resulting redline lands in the run list ready for reviewer sign-off. For peer benchmarking, M&A diligence, or compliance review of a competitor.
  2. Draft brochure (pre-file self-review). Upload a PDF that hasn't been filed yet. Bytes stay on the local machine; same pipeline runs on a synthetic 99-prefixed CRD. For a CCO writing this year's amendment who wants to catch missing disclosures and unclear language before the SEC examiner does. See ADR 0016 § 5 for the cache-hijack trick that makes this zero-pipeline-modification.

The redline body is reused verbatim from render_redline_html (iframed) so the bytes a CCO sees in the browser are the same bytes the email/PDF path produces. Decision form posts via HTMX → the decisions panel updates in place. See ADR 0016 for the design choices (server- rendered, iframe, HTMX, no SPA).

The seed CLI inserts two demo rows: the live Brown Advisory filed run, plus a draft-shaped companion that reuses the same brochure bytes via the upload code path. Run it once after a fresh DB so the dashboard is non-empty on first visit; idempotent on re-run. Pass --no-draft to skip the draft companion.

Demo

End-to-end flow: IAPD firm-summary page → reviewer dashboard with the two seeded runs → click into the Brown Advisory row → side-by-side redline + decision form → submit a revise_requested decision → audit row appears in place via HTMX. Recorded against the live local app.

ADV-Lens demo

A still-frame 4-panel storyboard (docs/images/demo-storyboard.png) covers the same flow for skim-readers who don't want to wait for the GIF to load. Recording playbook is at docs/demo-playbook.md.

Architecture

See docs/architecture.md for the diagram and docs/adr/0001-stack-choices.md for the stack rationale. Operator-facing reference is the printable docs/user-manual.pdf (26 pages).

Stack: Python 3.12 · uv · FastAPI · LangGraph · Anthropic Claude (Haiku 4.5 / Sonnet 4.6 / Opus 4.7 per-node cost tier) · Pydantic + Instructor · Qdrant · hybrid dense (bge-small-en-v1.5) + BM25 + RRF + cross-encoder rerank · Langfuse · Postgres (via SQLModel) · pytest · ruff · Docker Compose.

How to run

Prerequisites

  • Python 3.12+
  • uv 0.10+
  • Docker Desktop (for Langfuse + Postgres + Qdrant)
  • An Anthropic API key

First boot

cp .env.example .env
# fill in ANTHROPIC_API_KEY. Leave LANGFUSE_* blank until first compose up.

uv sync                  # resolve + install deps
uv run pytest            # smoke + eval harness, all green

# bring up the stack
docker compose up -d postgres qdrant langfuse-web
# visit http://localhost:3000 to provision Langfuse and grab the
# public/secret keys, then paste them into .env

uv run python -m adv_lens.app.web.seed     # one-shot: seed the dashboard demo rows
uv run uvicorn adv_lens.app.main:app --reload
# → http://localhost:8000/review           # the reviewer UI (start here)
# → http://localhost:8000/healthz          # liveness probe
# → http://localhost:8000/docs             # FastAPI auto-docs

Note for Windows users (PowerShell): the bash-style VAR=value cmd prefix doesn't work. To override the default Postgres DSN with sqlite for a no-Docker quickstart:

$env:POSTGRES_DSN = "sqlite:///./data/adv_lens_dev.db"
uv run python -m adv_lens.app.web.seed
uv run uvicorn adv_lens.app.main:app --port 8000 --reload

Ingest a brochure

# Resolve CRD via IAPD search, then fetch every current brochure PDF
uv run python -m adv_lens.ingestion.cli fetch-brochure 108000

# Or skip the search hop and fetch a specific filing version directly
uv run python -m adv_lens.ingestion.cli fetch-brochure 108000 --vid 999123

# Dry-parse an IARD bulk Part 1 CSV (first 20 rows)
uv run python -m adv_lens.ingestion.cli load-iard data/iard/ADV_Base_A_202604.csv --limit 20

Brochures land at data/brochures/<CRD>/<BRCHR_VRSN_ID>.pdf. The cache is content-addressed and immutable — a new filing gets a new version ID. See docs/adr/0002-data-sources.md for the ingestion contract, rate-limit defaults, and SEC User-Agent policy.

Segment a brochure into Item 1–18 sections

uv run python -m adv_lens.segmenter.cli data/brochures/108000/999001.pdf
# Add --full to emit unabridged section bodies.

The primary backend is a regex on SEC-mandated Item headers — deterministic, offline, dependency-light. A LlamaParse fallback is wired for scanned PDFs that defeat the heuristic (placeholder; activates when a real scanned brochure shows up in the golden set). See docs/adr/0003-segmenter-strategy.md for why this diverges from the brief's alphanome-ai/sec-parser default.

Run the pipeline end-to-end

# CLI runs the pipeline synchronously and prints the final ADVState as JSON.
uv run python -m adv_lens.app.graph.cli 108000
uv run python -m adv_lens.app.graph.cli 108000 --vid 999123

# HTTP is async: POST returns 202 + a status URL; poll until complete.
curl -s -X POST http://localhost:8000/pipeline/run \
    -H 'content-type: application/json' \
    -d '{"crd": "108000", "brochure_version_id": "999123"}' | jq
# {"trace_id": "advlens-abc123", "status": "queued", "status_url": "/pipeline/run/advlens-abc123"}

curl -s http://localhost:8000/pipeline/run/advlens-abc123 | jq
# Returns the persisted PipelineRun row — status walks queued → running →
# (complete | failed). When complete, result.redline holds the typed
# RedlineReport and result.review_status is "pending_review".

Pipeline (when ANTHROPIC_API_KEY is set): START → fetch_brochure → segment_brochure → [extract_fee | extract_disciplinary | extract_conflicts] → retrieve_peers → write_redline → hitl_gate → END. Without an Anthropic key the pipeline collapses to fetch + segment only. Langfuse traces are emitted automatically when LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are set, no-op otherwise. The async worker runs in-process today (asyncio.create_task + a persisted pipeline_runs table) — see docs/adr/0011-async-pipeline-worker.md for the path to a real queue.

Operators run the reaper on cron to clean up rows from worker restarts:

# Dry-run to see what would be reaped (no DB mutation):
uv run python -m adv_lens.app.jobs.reaper --dry-run --verbose

# Real sweep — marks rows >10min in `running` as failed.
uv run python -m adv_lens.app.jobs.reaper

Record a CCO decision on a pending report

The reviewer UI at /review is the intended path — open a run, fill the form, the decision row writes itself. The JSON endpoint stays available for scripted/operator use:

# Pipeline returns state.redline + state.report_hash + state.review_status="pending_review".
# After the CCO reviews, record the decision (writes one row to human_reviews):
curl -s -X POST http://localhost:8000/report/decision \
    -H 'content-type: application/json' \
    -d '{
      "trace_id": "advlens-abc123",
      "brochure_crd": "108000",
      "report_hash": "<64-hex from state.report_hash>",
      "reviewer": "cco@firm.example",
      "decision": "approved",
      "rationale": "Clean report; aligns with peer norms."
    }' | jq

# All decisions for a trace, oldest first:
curl -s http://localhost:8000/report/decision/advlens-abc123 | jq

See docs/adr/0010-hitl-gate.md for the marker-vs-interrupt design and audit-trail rationale, and docs/adr/0016-review-ui.md for the server-rendered UI choice.

Seed the peer corpus into Qdrant

# Bring up Qdrant
docker compose up -d qdrant

# Seed N peer brochures by running the pipeline per CRD and indexing
# each Item section as one vector (skips "Not applicable" placeholders).
cp docs/examples/peers-example.json data/peers/q2-2026.json
# Edit data/peers/q2-2026.json with real CRDs.
uv run python -m adv_lens.retrieval.cli seed-peers data/peers/q2-2026.json \
    --report-out data/peers/q2-2026.report.json

# Dense-only sanity check
uv run python -m adv_lens.retrieval.cli query \
    "tiered fee schedule" --item 5 --aum-band '$1B-$10B' -k 5

# Hybrid (dense + BM25 sparse with RRF fusion + cross-encoder rerank)
uv run python -m adv_lens.retrieval.cli query \
    "tiered fee schedule" --item 5 --aum-band '$1B-$10B' -k 5 --hybrid

# Hybrid without the reranker (raw RRF order, useful for diagnostics)
uv run python -m adv_lens.retrieval.cli query \
    "tiered fee schedule" --item 5 -k 5 --hybrid --no-rerank

bge-small-en-v1.5 (384-dim) downloads ~130 MB and the cross-encoder (ms-marco-MiniLM-L-6-v2) ~80 MB on first invocation. Point IDs are deterministic per (CRD, brochure_version_id, item_number) — re-running seed-peers upserts in place. Hybrid retrieval uses Qdrant's server-side RRF over named dense + sparse vectors; reranking happens in Python on the top 50 fused hits. See docs/adr/0004-peer-corpus-indexing.md for the schema and docs/adr/0007-hybrid-retrieval.md for the BM25/RRF/rerank choices.

Run the eval harness

uv run python -m eval.runner
# writes eval/results/<run_id>/report.{json,md}

Evaluation

Hand-labeled golden set under eval/fixtures/, one JSON per item.

section_type target labeled last F1 (run 20260426T143520Z)
segmenter 5 1 1.000
fee 20 5 0.858 (4/5 pass)
disciplinary 15 5 0.950 (5/5 pass)
conflicts 15 5 0.893 (4/5 pass)
redline 10 2 1.000 (structural)
smoke 1 1 1.000
total 66 19 17/19 pass, mean 0.921

The fee / disciplinary / conflicts directories carry two prose styles side by side: short synthetic-clean fixtures (item_001-item_003) that round-trip cleanly through the scorer and longer realism-style fixtures (item_004+) using the structural patterns common in large-RIA ADV brochures (multi-program cross-references, "in our sole discretion" hedging, BrokerCheck citations) — anonymous to avoid singling-out concerns. See eval/fixtures/README.md for the curation rationale.

Scoring strategy (per PROJECT_BRIEF.md):

  • Structured-field extraction → exact-match F1
  • Narrative redline → LLM-as-judge + second judge cross-check to catch judge drift
  • Langfuse traces on every run

CI runs the harness on every PR and uploads eval/results/ as an artifact.

Compliance posture

See docs/compliance.md for the full CCO-grade write-up — vendor disclosure, specific Advisers Act / FINRA rules engaged, audit-trail design, failure-mode acknowledgement, and a practical playbook for what to do when your firm is examined.

Short version: all data is public SEC filings, outputs are analyst aid not legal advice, every LLM call logs to an audit table, every report passes through a HITL gate before release.

Known limitations (as of 2026-04-26)

The honest catalog. None of these are hidden at runtime — each shows up either as an extraction_warnings entry, a finding in the redline, an ADR, or a callout in the user manual.

  • Multi-program brochures bundle Items together. Some Part 2A brochures (Brown Advisory is the canonical example) lack standalone Item N headers for Items 5/10/11/12 — content is bundled into per-program subsections. The regex segmenter cannot isolate them. Mitigated by the Haiku 4.5 LLM rescue (ADR 0014) that runs when the regex returns <2K-char bodies for any of the five extractor-consumed Items. Triggered selectively; regex stays primary.
  • SEC IAPD URL/UA fragility. SEC retired /search/entity and now bot-detects on files.adviserinfo.sec.gov. Patched to a polite-bot hybrid UA (mirrors Googlebot's pattern) that identifies us and passes the filter. Diagnostic playbook for the next migration in ADR 0015.
  • HITL gate is marker-style (sets review_status="pending_review" + report_hash), not a true LangGraph interrupt_before with checkpointer-backed pause/resume — see ADR 0010 for why. Audit row is written when a CCO acts via POST /report/decision.
  • Async pipeline worker is in-process (asyncio.create_task + persisted PipelineRun rows). Process restart kills in-flight jobs; the reaper (python -m adv_lens.app.jobs.reaper) sweeps stuck rows on cron. Real queue (arq / procrastinate) swap path documented in ADR 0011.
  • Redline scorer is structural-only today (4-12 findings, valid scorecard categories, severity not pathological). LLM-as-judge with dual-judge cross-check lands Week 4 (ADR 0009 pending).
  • Eval F1 has run-to-run noise of up to ~0.15 on individual fixtures because Anthropic deprecated temperature on the claude-4 family. Multi-run averaging (N=3, report median + spread) is Week-4 work.
  • retrieve_peers_node uses static per-Item query anchors; an extraction-derived query refinement is Week 4+ work.
  • state.brochure_aum_band is None until a future IARDLookupNode populates it from the bulk Part 1 CSV; until then peer queries don't filter by AUM band.
  • Hybrid retrieval default. Dense + BM25 + RRF + cross-encoder rerank is make_peer_store()'s default. Backfilling sparse vectors into a dense-only collection requires a snapshot-and-reseed (ADR 0007).
  • Segmenter LlamaParse fallback is a placeholder; scanned-PDF brochures currently error with a routing hint (ADR 0003).
  • Peer corpus is operator-curated via JSON; IARD-CSV-driven peer auto-discovery is deferred (ADR 0013 pending).
  • Ollama on-prem fallback is deferred (ADR 0012 pending).
  • Audit-trail bundle export endpoint is planned but not yet shipped; today operators join pipeline_runs / llm_calls / human_reviews on trace_id + report_hash directly.
  • Demo GIF at docs/demo.gif is operator-recorded against the live local app and embedded in the Demo section above. Captures the dashboard list view → row click → redline + decision form → HTMX panel update on submit.
  • No browser-side authentication. The reviewer UI is local-only by design (single-CCO dev tool); a real RIA pilot would add SSO + per-firm tenancy. ADR 0016 § Context spells this out.

Roadmap

  • Week 1 — scaffold (day 1) + SEC IAPD fetcher and IARD Part 1 loader (day 2) + Item 1–18 segmenter (day 3) + LangGraph fetch + segment pipeline (day 4) + dense peer-corpus retrieval (day 5). Foundations milestone — done.
  • Week 2 — fee extractor + LLMClient + audit sink (day 6); disciplinary extractor + parallel-merge reducer (day 7); conflicts extractor + three-way fan-out (day 8); hybrid retrieval (BM25 + RRF + rerank) (day 9); redline writer + structural validator + fan-in topology (day 10). Done.
  • Week 3 — retrieve_peers_node, HumanReviewGate, async pipeline worker, reaper, first live IAPD run (Brown Advisory CRD 110181), segmenter LLM rescue (ADR 0014), per-brochure HTML/PDF redline render, Langfuse trace emission per LLM call. Done.
  • Week 4 — LLM-as-judge + dual-judge cross-check for redline scoring (ADR 0009 pending); golden-set scale-up to 65 fixtures using real-brochure prose; multi-run averaging in eval; CI regression gates that block PRs on F1 drop.
  • Week 5 — reviewer UI (ADR 0016) + 60-90s demo GIF recorded against it; audit-trail bundle export endpoint; layperson docs/intro.md (5th-grade reading level for a non-technical audience); architecture diagram refresh; output-bundle layout across CLIs (--out-dir).
  • Week 6 (optional) — ADV-Diff bolt-on: scheduled quarterly change detector that reuses this project's parser + adds a change-summary agent (full design in PROJECT_BRIEF.md).

Full cadence: PROJECT_BRIEF.md. Open ideas: docs/parking-lot.md.

License

MIT.

About

Form ADV Part 2A intelligence + peer benchmarking — LangGraph, hybrid retrieval, eval harness

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors