Form ADV Part 2A intelligence + peer benchmarking. A LangGraph agent that ingests any RIA's Form ADV Part 2A brochure and produces a compliance-and-competitive scorecard: fee-structure benchmarking vs peer advisers, disciplinary disclosure flags, conflict-of-interest enumeration, and a redline against SEC plain-English expectations.
Status: Pipeline feature-complete end-to-end on real SEC filings,
plus a server-rendered review UI at /review covering both the
benchmark-a-filed-firm and the score-your-own-draft use cases. First
live IAPD run landed against Brown Advisory LLC (CRD 110181) on
2026-04-26, sample at docs/examples/sample-report.json
(HTML /
PDF). Eval harness at 17/19 pass,
mean F1 0.921 on the 19-fixture golden set after the Day-14g scorer
fixes. The full operator reference is in docs/user-manual.pdf
(26 pages, includes the Brown Advisory Item 5 + Item 9 + IAPD-search
screenshots).
RIA Chief Compliance Officers spend ~40 hours a year reading peer brochures to defend their own annual ADV review. M&A diligence teams at RIA aggregators do the same work on every target firm. Both are paralegal-grade reads that should be machine-assisted.
Neither audience wants a chatbot. They want a structured scorecard they can defend on exam or in a deal memo.
- A hiring manager at F2 Strategy or a peer consultancy evaluating whether Robert Colling can ship production AI for wealth management.
- An RIA Chief Compliance Officer who wants to trust the outputs, inspect the Langfuse trace, and re-run the eval harness quarterly.
- A senior engineer evaluating the code for hire. Architecture, eval discipline, structured-output contracts, HITL design.
The first live IAPD run produced this CCO-readable redline for Brown Advisory LLC (CRD 110181):
Full artifact: HTML · PDF · JSON.
A thin server-rendered review surface ships with the app at
/review. It lists pipeline runs, opens
each one as a side-by-side redline + decision form, and writes the same
human_reviews row the JSON POST /report/decision would — so the
audit semantics carry through unchanged.
docker compose up -d postgres qdrant
uv run python -m adv_lens.app.web.seed # one-shot: load Brown Advisory samples (filed + draft)
uv run uvicorn adv_lens.app.main:app --reload
# → http://localhost:8000/reviewClick any row to open the detail page — the redline iframe on the left, the reviewer decision form on the right, decision history below it. Submitting a decision posts via HTMX and swaps the decisions panel in place without a full-page reload:
The dashboard supports both audience-facing use cases:
- Filed brochure (benchmark / diligence). Enter a firm's CRD; the pipeline fetches the brochure from SEC IAPD, runs end-to-end, and the resulting redline lands in the run list ready for reviewer sign-off. For peer benchmarking, M&A diligence, or compliance review of a competitor.
- Draft brochure (pre-file self-review). Upload a PDF that hasn't
been filed yet. Bytes stay on the local machine; same pipeline runs
on a synthetic
99-prefixed CRD. For a CCO writing this year's amendment who wants to catch missing disclosures and unclear language before the SEC examiner does. See ADR 0016 § 5 for the cache-hijack trick that makes this zero-pipeline-modification.
The redline body is reused verbatim from render_redline_html
(iframed) so the bytes a CCO sees in the browser are the same bytes
the email/PDF path produces. Decision form posts via HTMX → the
decisions panel updates in place. See
ADR 0016 for the design choices (server-
rendered, iframe, HTMX, no SPA).
The seed CLI inserts two demo rows: the live Brown Advisory filed run,
plus a draft-shaped companion that reuses the same brochure bytes via
the upload code path. Run it once after a fresh DB so the dashboard is
non-empty on first visit; idempotent on re-run. Pass --no-draft to
skip the draft companion.
End-to-end flow: IAPD firm-summary page → reviewer dashboard with the
two seeded runs → click into the Brown Advisory row → side-by-side
redline + decision form → submit a revise_requested decision → audit
row appears in place via HTMX. Recorded against the live local app.
A still-frame 4-panel storyboard (docs/images/demo-storyboard.png) covers the same flow for skim-readers who don't want to wait for the GIF to load. Recording playbook is at docs/demo-playbook.md.
See docs/architecture.md for the diagram and
docs/adr/0001-stack-choices.md for the
stack rationale. Operator-facing reference is the printable
docs/user-manual.pdf (26 pages).
Stack: Python 3.12 · uv · FastAPI · LangGraph · Anthropic Claude
(Haiku 4.5 / Sonnet 4.6 / Opus 4.7 per-node cost tier) · Pydantic +
Instructor · Qdrant · hybrid dense (bge-small-en-v1.5) + BM25 + RRF +
cross-encoder rerank · Langfuse · Postgres (via SQLModel) · pytest · ruff ·
Docker Compose.
- Python 3.12+
- uv 0.10+
- Docker Desktop (for Langfuse + Postgres + Qdrant)
- An Anthropic API key
cp .env.example .env
# fill in ANTHROPIC_API_KEY. Leave LANGFUSE_* blank until first compose up.
uv sync # resolve + install deps
uv run pytest # smoke + eval harness, all green
# bring up the stack
docker compose up -d postgres qdrant langfuse-web
# visit http://localhost:3000 to provision Langfuse and grab the
# public/secret keys, then paste them into .env
uv run python -m adv_lens.app.web.seed # one-shot: seed the dashboard demo rows
uv run uvicorn adv_lens.app.main:app --reload
# → http://localhost:8000/review # the reviewer UI (start here)
# → http://localhost:8000/healthz # liveness probe
# → http://localhost:8000/docs # FastAPI auto-docsNote for Windows users (PowerShell): the bash-style VAR=value cmd
prefix doesn't work. To override the default Postgres DSN with sqlite
for a no-Docker quickstart:
$env:POSTGRES_DSN = "sqlite:///./data/adv_lens_dev.db"
uv run python -m adv_lens.app.web.seed
uv run uvicorn adv_lens.app.main:app --port 8000 --reload# Resolve CRD via IAPD search, then fetch every current brochure PDF
uv run python -m adv_lens.ingestion.cli fetch-brochure 108000
# Or skip the search hop and fetch a specific filing version directly
uv run python -m adv_lens.ingestion.cli fetch-brochure 108000 --vid 999123
# Dry-parse an IARD bulk Part 1 CSV (first 20 rows)
uv run python -m adv_lens.ingestion.cli load-iard data/iard/ADV_Base_A_202604.csv --limit 20Brochures land at data/brochures/<CRD>/<BRCHR_VRSN_ID>.pdf. The cache is
content-addressed and immutable — a new filing gets a new version ID.
See docs/adr/0002-data-sources.md for the
ingestion contract, rate-limit defaults, and SEC User-Agent policy.
uv run python -m adv_lens.segmenter.cli data/brochures/108000/999001.pdf
# Add --full to emit unabridged section bodies.The primary backend is a regex on SEC-mandated Item headers — deterministic,
offline, dependency-light. A LlamaParse fallback is wired for scanned PDFs
that defeat the heuristic (placeholder; activates when a real scanned
brochure shows up in the golden set). See
docs/adr/0003-segmenter-strategy.md
for why this diverges from the brief's alphanome-ai/sec-parser default.
# CLI runs the pipeline synchronously and prints the final ADVState as JSON.
uv run python -m adv_lens.app.graph.cli 108000
uv run python -m adv_lens.app.graph.cli 108000 --vid 999123
# HTTP is async: POST returns 202 + a status URL; poll until complete.
curl -s -X POST http://localhost:8000/pipeline/run \
-H 'content-type: application/json' \
-d '{"crd": "108000", "brochure_version_id": "999123"}' | jq
# {"trace_id": "advlens-abc123", "status": "queued", "status_url": "/pipeline/run/advlens-abc123"}
curl -s http://localhost:8000/pipeline/run/advlens-abc123 | jq
# Returns the persisted PipelineRun row — status walks queued → running →
# (complete | failed). When complete, result.redline holds the typed
# RedlineReport and result.review_status is "pending_review".Pipeline (when ANTHROPIC_API_KEY is set):
START → fetch_brochure → segment_brochure → [extract_fee | extract_disciplinary | extract_conflicts] → retrieve_peers → write_redline → hitl_gate → END.
Without an Anthropic key the pipeline collapses to fetch + segment only.
Langfuse traces are emitted automatically when LANGFUSE_PUBLIC_KEY and
LANGFUSE_SECRET_KEY are set, no-op otherwise. The async worker runs
in-process today (asyncio.create_task + a persisted pipeline_runs
table) — see docs/adr/0011-async-pipeline-worker.md
for the path to a real queue.
Operators run the reaper on cron to clean up rows from worker restarts:
# Dry-run to see what would be reaped (no DB mutation):
uv run python -m adv_lens.app.jobs.reaper --dry-run --verbose
# Real sweep — marks rows >10min in `running` as failed.
uv run python -m adv_lens.app.jobs.reaperThe reviewer UI at /review is the
intended path — open a run, fill the form, the decision row writes
itself. The JSON endpoint stays available for scripted/operator use:
# Pipeline returns state.redline + state.report_hash + state.review_status="pending_review".
# After the CCO reviews, record the decision (writes one row to human_reviews):
curl -s -X POST http://localhost:8000/report/decision \
-H 'content-type: application/json' \
-d '{
"trace_id": "advlens-abc123",
"brochure_crd": "108000",
"report_hash": "<64-hex from state.report_hash>",
"reviewer": "cco@firm.example",
"decision": "approved",
"rationale": "Clean report; aligns with peer norms."
}' | jq
# All decisions for a trace, oldest first:
curl -s http://localhost:8000/report/decision/advlens-abc123 | jqSee docs/adr/0010-hitl-gate.md for the marker-vs-interrupt design and audit-trail rationale, and docs/adr/0016-review-ui.md for the server-rendered UI choice.
# Bring up Qdrant
docker compose up -d qdrant
# Seed N peer brochures by running the pipeline per CRD and indexing
# each Item section as one vector (skips "Not applicable" placeholders).
cp docs/examples/peers-example.json data/peers/q2-2026.json
# Edit data/peers/q2-2026.json with real CRDs.
uv run python -m adv_lens.retrieval.cli seed-peers data/peers/q2-2026.json \
--report-out data/peers/q2-2026.report.json
# Dense-only sanity check
uv run python -m adv_lens.retrieval.cli query \
"tiered fee schedule" --item 5 --aum-band '$1B-$10B' -k 5
# Hybrid (dense + BM25 sparse with RRF fusion + cross-encoder rerank)
uv run python -m adv_lens.retrieval.cli query \
"tiered fee schedule" --item 5 --aum-band '$1B-$10B' -k 5 --hybrid
# Hybrid without the reranker (raw RRF order, useful for diagnostics)
uv run python -m adv_lens.retrieval.cli query \
"tiered fee schedule" --item 5 -k 5 --hybrid --no-rerankbge-small-en-v1.5 (384-dim) downloads ~130 MB and the cross-encoder
(ms-marco-MiniLM-L-6-v2) ~80 MB on first invocation. Point IDs are
deterministic per (CRD, brochure_version_id, item_number) — re-running
seed-peers upserts in place. Hybrid retrieval uses Qdrant's
server-side RRF over named dense + sparse vectors; reranking happens
in Python on the top 50 fused hits. See
docs/adr/0004-peer-corpus-indexing.md
for the schema and
docs/adr/0007-hybrid-retrieval.md
for the BM25/RRF/rerank choices.
uv run python -m eval.runner
# writes eval/results/<run_id>/report.{json,md}Hand-labeled golden set under eval/fixtures/, one JSON per item.
| section_type | target | labeled | last F1 (run 20260426T143520Z) |
|---|---|---|---|
| segmenter | 5 | 1 | 1.000 |
| fee | 20 | 5 | 0.858 (4/5 pass) |
| disciplinary | 15 | 5 | 0.950 (5/5 pass) |
| conflicts | 15 | 5 | 0.893 (4/5 pass) |
| redline | 10 | 2 | 1.000 (structural) |
| smoke | 1 | 1 | 1.000 |
| total | 66 | 19 | 17/19 pass, mean 0.921 |
The fee / disciplinary / conflicts directories carry two prose styles
side by side: short synthetic-clean fixtures (item_001-item_003)
that round-trip cleanly through the scorer and longer realism-style
fixtures (item_004+) using the structural patterns common in large-RIA
ADV brochures (multi-program cross-references, "in our sole discretion"
hedging, BrokerCheck citations) — anonymous to avoid singling-out
concerns. See eval/fixtures/README.md for the
curation rationale.
Scoring strategy (per PROJECT_BRIEF.md):
- Structured-field extraction → exact-match F1
- Narrative redline → LLM-as-judge + second judge cross-check to catch judge drift
- Langfuse traces on every run
CI runs the harness on every PR and uploads eval/results/ as an artifact.
See docs/compliance.md for the full CCO-grade write-up — vendor disclosure, specific Advisers Act / FINRA rules engaged, audit-trail design, failure-mode acknowledgement, and a practical playbook for what to do when your firm is examined.
Short version: all data is public SEC filings, outputs are analyst aid not legal advice, every LLM call logs to an audit table, every report passes through a HITL gate before release.
The honest catalog. None of these are hidden at runtime — each shows up
either as an extraction_warnings entry, a finding in the redline, an
ADR, or a callout in the user manual.
- Multi-program brochures bundle Items together. Some Part 2A
brochures (Brown Advisory is the canonical example) lack standalone
Item Nheaders for Items 5/10/11/12 — content is bundled into per-program subsections. The regex segmenter cannot isolate them. Mitigated by the Haiku 4.5 LLM rescue (ADR 0014) that runs when the regex returns <2K-char bodies for any of the five extractor-consumed Items. Triggered selectively; regex stays primary. - SEC IAPD URL/UA fragility. SEC retired
/search/entityand now bot-detects onfiles.adviserinfo.sec.gov. Patched to a polite-bot hybrid UA (mirrors Googlebot's pattern) that identifies us and passes the filter. Diagnostic playbook for the next migration in ADR 0015. - HITL gate is marker-style (sets
review_status="pending_review"+report_hash), not a true LangGraphinterrupt_beforewith checkpointer-backed pause/resume — see ADR 0010 for why. Audit row is written when a CCO acts viaPOST /report/decision. - Async pipeline worker is in-process (
asyncio.create_task+ persistedPipelineRunrows). Process restart kills in-flight jobs; the reaper (python -m adv_lens.app.jobs.reaper) sweeps stuck rows on cron. Real queue (arq / procrastinate) swap path documented in ADR 0011. - Redline scorer is structural-only today (4-12 findings, valid scorecard categories, severity not pathological). LLM-as-judge with dual-judge cross-check lands Week 4 (ADR 0009 pending).
- Eval F1 has run-to-run noise of up to ~0.15 on individual
fixtures because Anthropic deprecated
temperatureon the claude-4 family. Multi-run averaging (N=3, report median + spread) is Week-4 work. retrieve_peers_nodeuses static per-Item query anchors; an extraction-derived query refinement is Week 4+ work.state.brochure_aum_bandis None until a futureIARDLookupNodepopulates it from the bulk Part 1 CSV; until then peer queries don't filter by AUM band.- Hybrid retrieval default. Dense + BM25 + RRF + cross-encoder
rerank is
make_peer_store()'s default. Backfilling sparse vectors into a dense-only collection requires a snapshot-and-reseed (ADR 0007). - Segmenter LlamaParse fallback is a placeholder; scanned-PDF brochures currently error with a routing hint (ADR 0003).
- Peer corpus is operator-curated via JSON; IARD-CSV-driven peer auto-discovery is deferred (ADR 0013 pending).
- Ollama on-prem fallback is deferred (ADR 0012 pending).
- Audit-trail bundle export endpoint is planned but not yet
shipped; today operators join
pipeline_runs/llm_calls/human_reviewsontrace_id+report_hashdirectly. - Demo GIF at
docs/demo.gifis operator-recorded against the live local app and embedded in the Demo section above. Captures the dashboard list view → row click → redline + decision form → HTMX panel update on submit. - No browser-side authentication. The reviewer UI is local-only by design (single-CCO dev tool); a real RIA pilot would add SSO + per-firm tenancy. ADR 0016 § Context spells this out.
- Week 1 — scaffold (day 1) + SEC IAPD fetcher and IARD Part 1 loader (day 2) + Item 1–18 segmenter (day 3) + LangGraph fetch + segment pipeline (day 4) + dense peer-corpus retrieval (day 5). Foundations milestone — done.
- Week 2 — fee extractor + LLMClient + audit sink (day 6); disciplinary extractor + parallel-merge reducer (day 7); conflicts extractor + three-way fan-out (day 8); hybrid retrieval (BM25 + RRF + rerank) (day 9); redline writer + structural validator + fan-in topology (day 10). Done.
- Week 3 — retrieve_peers_node, HumanReviewGate, async pipeline worker, reaper, first live IAPD run (Brown Advisory CRD 110181), segmenter LLM rescue (ADR 0014), per-brochure HTML/PDF redline render, Langfuse trace emission per LLM call. Done.
- Week 4 — LLM-as-judge + dual-judge cross-check for redline scoring (ADR 0009 pending); golden-set scale-up to 65 fixtures using real-brochure prose; multi-run averaging in eval; CI regression gates that block PRs on F1 drop.
- Week 5 — reviewer UI (ADR 0016) +
60-90s demo GIF recorded against it; audit-trail bundle export
endpoint; layperson
docs/intro.md(5th-grade reading level for a non-technical audience); architecture diagram refresh; output-bundle layout across CLIs (--out-dir). - Week 6 (optional) — ADV-Diff bolt-on: scheduled quarterly change detector that reuses this project's parser + adds a change-summary agent (full design in PROJECT_BRIEF.md).
Full cadence: PROJECT_BRIEF.md. Open ideas: docs/parking-lot.md.
MIT.



