PodGraph — TODO

Source of truth: podgraph-roadmap-revised.md

Phase 0: Pipeline Scripts (COMPLETE)

Phase 0.5: Discovery & Aggregation (COMPLETE)

Podcast Discovery

add-podcast.ts — register podcast RSS feeds via iTunes Search API
discover.ts — scan RSS feeds for guest appearances, skip host's own podcast
data/podcasts.json — podcast registry with host info for exclusion logic

Person Aggregation Pipeline (`aggregate.ts`)

Cross-episode data merging — themes, books, tools, people, companies deduplicated
Semantic theme merging — Claude groups related themes across episodes (~$0.015)
Conviction extraction & ranking — strength scored 1-10, evolution detection (~$0.053)
Worldview synthesis — 2-3 paragraph narrative, not a bio (~$0.013)
Deep-on badge identification — 2-4 signature deep-dive topics (~$0.011)
Taste clustering — recommendations grouped by thematic pattern (~$0.027)
Role deduplication — synonym groups + substring merging

Multi-Pass Extraction & Quality

extract-multipass.ts — Pass 1 (Haiku) entities, Pass 2 (Sonnet) themes with entity context. Now the default pipeline extraction.
correct-quotes.ts — post-extraction quote correction against transcript utterances (no API calls)
validate-extraction.ts — checks quote accuracy, entity cross-refs, entity classification
extract-gemini.ts — Gemini A/B testing for extraction quality comparison
export-prompt.ts — export resolved prompts for manual model testing in claude.ai

Monitoring & Cost Tracking

status.ts — per-person extraction/aggregation status
costs.ts — cost summary across all episodes and profiles
scripts/lib/costs.ts — per-step cost ledger (model, tokens, USD per pipeline step)

Person Profile Page (`build-profile.ts`)

Documentation

PODGRAPH_PIPELINE_GUIDE.md — full usage guide for all scripts, prompts, data files

Immediate Next Steps

Re-extract with 4-pass pipeline

Re-extract all 12 Huberman episodes using 4-pass extraction (segmentation + entities parallel, theme synthesis from summaries, quote selection on targeted segments)
Run npm run correct-quotes after re-extraction to verify quote accuracy
Run npm run validate on each episode to check entity references

Re-aggregate and rebuild

Re-aggregate Huberman profile with 4-pass extractions: 12 episodes, 81 themes (31 cross-episode), 31 convictions, 130 tools
Review the updated profile page for quality

Update documentation

Update CASE_STUDY.md — add multi-pass extraction, quote correction, validation, model optimization
Update PODGRAPH_PIPELINE_GUIDE.md — multi-pass is now default, add new scripts (correct-quotes, validate, status, extract-multipass, extract-gemini)
Update TODO.md Phase 0.5 to reflect latest changes (multi-pass, quote correction, validation, cost tracking, Gemini A/B test)

Process more people

Profile a second person to test the pipeline beyond Huberman
Add more podcasts to the registry (Modern Wisdom, Rich Roll, All-In, etc.)

Phase 1: Foundation

Goal: Move from scripts to a real application with persistent storage.

Scaffolding (P0)

Next.js 14 app scaffolding — TypeScript, Tailwind, shadcn/ui
PostgreSQL setup — Supabase or Neon
Prisma schema — Podcast, Episode (internal, no public route), Person, PersonConnection, EntityRegistry
BullMQ + Redis setup for background job queue
tRPC setup for type-safe API routes

Pipeline Migration (P0)

Migrate scripts/transcribe.ts → src/lib/pipeline/transcription.ts
Migrate scripts/correct-transcript.ts → src/lib/ai/correction.ts
Migrate scripts/identify-speakers.ts → src/lib/pipeline/speaker-id.ts
Migrate scripts/extract.ts → src/lib/ai/extraction.ts
Migrate scripts/update-registry.ts → src/lib/pipeline/registry.ts
Create BullMQ workers: transcription.worker.ts, extraction.worker.ts, aggregation.worker.ts

Admin & Ingestion (P0)

Admin form to submit episode URL → triggers pipeline
Pipeline status monitoring in admin

Basic Frontend (P0/P1)

Basic person list page — name, roles, appearance count
Basic full-text search over Person names and themes (P1)

Phase 2: Person Page MVP

Goal: Build the person page as a React/Next.js page (currently static HTML prototype).

Remaining Aggregation Work

Connection card generation — also-spoke-about, disagrees-on, recommended-by
Process more episodes per person to stress-test aggregation at scale

Person Profile Page — React (P0)

Port static HTML profile to Next.js React components
Header — name, aggregated self-described roles, appearance count, date range
Worldview summary section
Positions & beliefs — merged convictions + theme context
Inline contextual connection cards
Taste profile — clustered recommendations
People they mention — compact grid with expandable contexts
Podcast appearances — chronological with theme highlights

Phase 3: Discovery & Polish

Goal: Add browsing, category exploration, and frontend polish.

Category browsing pages — by occupation, interest, hobby, recommended books (P0)
Home / Explore page — featured people, trending topics, recent episodes (P0)
Podcast page — podcast info, all processed episodes, guest profile links (P1)
Responsive design — full mobile responsiveness across all pages (P1)
Search enhancements — search within quotes, filter by topic, date range, person (P1)

Phase 4: Scale & Automation

Goal: Automate ingestion and prepare for growth.

RSS feed auto-ingestion — scheduled jobs to check RSS feeds and process new episodes (P0)
Admin dashboard — manage podcasts, monitor pipeline status, review flagged IDs (P0)
Auth + user accounts — registration, saved favorites, custom collections (P1)
Performance optimization — caching, ISR for profile pages, lazy loading, pagination (P1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PodGraph — TODO

Phase 0: Pipeline Scripts (COMPLETE)

Phase 0.5: Discovery & Aggregation (COMPLETE)

Podcast Discovery

Person Aggregation Pipeline (`aggregate.ts`)

Multi-Pass Extraction & Quality

Monitoring & Cost Tracking

Person Profile Page (`build-profile.ts`)

Documentation

Immediate Next Steps

Re-extract with 4-pass pipeline

Re-aggregate and rebuild

Update documentation

Process more people

Phase 1: Foundation

Scaffolding (P0)

Pipeline Migration (P0)

Admin & Ingestion (P0)

Basic Frontend (P0/P1)

Phase 2: Person Page MVP

Remaining Aggregation Work

Person Profile Page — React (P0)

Phase 3: Discovery & Polish

Phase 4: Scale & Automation

FilesExpand file tree

TODO.md

Latest commit

History

TODO.md

File metadata and controls

PodGraph — TODO

Phase 0: Pipeline Scripts (COMPLETE)

Phase 0.5: Discovery & Aggregation (COMPLETE)

Podcast Discovery

Person Aggregation Pipeline (aggregate.ts)

Multi-Pass Extraction & Quality

Monitoring & Cost Tracking

Person Profile Page (build-profile.ts)

Documentation

Immediate Next Steps

Re-extract with 4-pass pipeline

Re-aggregate and rebuild

Update documentation

Process more people

Phase 1: Foundation

Scaffolding (P0)

Pipeline Migration (P0)

Admin & Ingestion (P0)

Basic Frontend (P0/P1)

Phase 2: Person Page MVP

Remaining Aggregation Work

Person Profile Page — React (P0)

Phase 3: Discovery & Polish

Phase 4: Scale & Automation

Person Aggregation Pipeline (`aggregate.ts`)

Person Profile Page (`build-profile.ts`)