Skip to content

camptodata/research-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

research-agent

A multi-agent research engine. Give it a natural-language research task; it uses an LLM coordinator to decompose the task into a typed step plan with topological dependencies, dispatches specialized workers (scout, analyst, deep-researcher, writer) in parallel layers, enforces hard budget caps on LLM calls, SERP queries, and scraping credits, and runs an automatic gap-detection feedback loop that re-scouts when results are insufficient.

Why use it

  • Typed task plans validated with Zod-style schemas — the coordinator outputs structured JSON, not freeform text
  • Topological scheduling — steps with no dependencies run in parallel; later steps wait only on their direct inputs
  • Budget guards — hard caps on LLM calls, SERP queries, Firecrawl credits, and LinkedIn requests enforced at runtime
  • Gap-fill feedback loop — after scoring, a gap detector identifies missing coverage and triggers a targeted re-scout before final output
  • Deterministic candidate scoring with source-type priors and domain-authority tiers (no LLM needed for ranking)
  • Swappable LLM backend — use Claude via CLI subprocess (LLM_BACKEND=cli) or directly via Anthropic SDK (LLM_BACKEND=api)

Architecture

┌──────────────────────────────────────────────────────┐
│  Coordinator                                         │
│  Receives task string, calls LLM to decompose into   │
│  a typed TaskPlan, then dispatches workers in        │
│  topological layer order.                            │
└────────────────────┬─────────────────────────────────┘
                     │
        ┌────────────┼────────────┐
        v            v            v
   ┌─────────┐ ┌─────────┐ ┌───────────┐
   │  scout  │ │ analyst │ │  writer   │  Workers
   │         │ │         │ │           │  (compose blocks
   │deepRe-  │ │profile  │ │           │   + LLM reasoning)
   │searcher │ │Builder  │ │           │
   └────┬────┘ └────┬────┘ └─────┬─────┘
        │           │             │
        v           v             v
┌──────────────────────────────────────────────────────┐
│  Blocks (deterministic, no LLM)                      │
│  sources/  — SERP, Firecrawl, Specter, LinkedIn      │
│  processing/ — scoring, reconciler, gapDetector,     │
│               crossEntityAnalyst, htmlExtract        │
│  outputs/ — markdown, CSV, Notion (stub)             │
└──────────────────────────────────────────────────────┘

Quickstart

npm install
cp .env.example .env
# Fill in API keys (see .env.example)
npx ts-node scripts/run.ts

You will be prompted to enter a research task at the terminal. The coordinator will ask clarifying questions if needed (max 3 rounds), then execute the plan and write results to results/<task-slug>/.

Example tasks

Research the top 10 open-source vector databases comparing licensing,
performance benchmarks, and ecosystem maturity

Find recent academic literature on multi-agent reinforcement learning
with budget constraints — summarize key papers and authors

Profile 5 active projects working on local-first software, including
founding team, funding status, and community traction

Environment variables

# Required
ANTHROPIC_API_KEY=          # Anthropic API key (needed if LLM_BACKEND=api)
SERPAPI_KEY=                # SerpAPI key for web search

# Optional — enable additional source adapters
FIRECRAWL_API_KEY=          # Firecrawl for JS-rendered page scraping
SPECTER_API_KEY=            # Specter for structured company/person data
LINKEDIN_LI_AT=             # LinkedIn Voyager session cookie
LINKEDIN_JSESSIONID=        # LinkedIn Voyager session cookie
NOTION_API_KEY=             # Notion integration (output stub — see outputs/notion.ts)

# Backend selection
LLM_BACKEND=cli             # "cli" (claude --print subprocess) or "api" (Anthropic SDK)
OUTPUT_BACKEND=markdown     # "markdown" | "csv" | "notion"

Configuration

Budget caps are set in src/config.ts:

export const BUDGET_DEFAULTS = {
  llmCalls: 20,
  serpQueries: 50,
  firecrawlCredits: 40,
  linkedinRequests: 10,
  gapLoops: 2,
};

Output adapters live in src/blocks/outputs/:

  • markdown.ts — default; writes report.md, data.json, sources.md
  • csv.ts — flat CSV of enriched profiles
  • notion.ts — stub; falls back to markdown until NOTION_API_KEY is set and the adapter is implemented

Output

Results are written to results/<task-slug>/:

  • report.md — main narrative report (LLM-synthesized)
  • data.json — raw structured data for all enriched profiles
  • sources.md — full source attribution with URLs

Status

Early — extracted and generalized from a production system. Core pipeline (coordinator, workers, blocks) is functional. Specter and LinkedIn adapters require paid API access. Notion output adapter is a stub.

License

MIT

About

Multi-agent research engine — typed task plans, topological worker dispatch, budget guards, gap-fill feedback loop

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors