research-agent

A multi-agent research engine. Give it a natural-language research task; it uses an LLM coordinator to decompose the task into a typed step plan with topological dependencies, dispatches specialized workers (scout, analyst, deep-researcher, writer) in parallel layers, enforces hard budget caps on LLM calls, SERP queries, and scraping credits, and runs an automatic gap-detection feedback loop that re-scouts when results are insufficient.

Why use it

Typed task plans validated with Zod-style schemas — the coordinator outputs structured JSON, not freeform text
Topological scheduling — steps with no dependencies run in parallel; later steps wait only on their direct inputs
Budget guards — hard caps on LLM calls, SERP queries, Firecrawl credits, and LinkedIn requests enforced at runtime
Gap-fill feedback loop — after scoring, a gap detector identifies missing coverage and triggers a targeted re-scout before final output
Deterministic candidate scoring with source-type priors and domain-authority tiers (no LLM needed for ranking)
Swappable LLM backend — use Claude via CLI subprocess (LLM_BACKEND=cli) or directly via Anthropic SDK (LLM_BACKEND=api)

Architecture

┌──────────────────────────────────────────────────────┐
│  Coordinator                                         │
│  Receives task string, calls LLM to decompose into   │
│  a typed TaskPlan, then dispatches workers in        │
│  topological layer order.                            │
└────────────────────┬─────────────────────────────────┘
                     │
        ┌────────────┼────────────┐
        v            v            v
   ┌─────────┐ ┌─────────┐ ┌───────────┐
   │  scout  │ │ analyst │ │  writer   │  Workers
   │         │ │         │ │           │  (compose blocks
   │deepRe-  │ │profile  │ │           │   + LLM reasoning)
   │searcher │ │Builder  │ │           │
   └────┬────┘ └────┬────┘ └─────┬─────┘
        │           │             │
        v           v             v
┌──────────────────────────────────────────────────────┐
│  Blocks (deterministic, no LLM)                      │
│  sources/  — SERP, Firecrawl, Specter, LinkedIn      │
│  processing/ — scoring, reconciler, gapDetector,     │
│               crossEntityAnalyst, htmlExtract        │
│  outputs/ — markdown, CSV, Notion (stub)             │
└──────────────────────────────────────────────────────┘

Quickstart

npm install
cp .env.example .env
# Fill in API keys (see .env.example)
npx ts-node scripts/run.ts

You will be prompted to enter a research task at the terminal. The coordinator will ask clarifying questions if needed (max 3 rounds), then execute the plan and write results to results/<task-slug>/.

Example tasks

Research the top 10 open-source vector databases comparing licensing,
performance benchmarks, and ecosystem maturity

Find recent academic literature on multi-agent reinforcement learning
with budget constraints — summarize key papers and authors

Profile 5 active projects working on local-first software, including
founding team, funding status, and community traction

Environment variables

# Required
ANTHROPIC_API_KEY=          # Anthropic API key (needed if LLM_BACKEND=api)
SERPAPI_KEY=                # SerpAPI key for web search

# Optional — enable additional source adapters
FIRECRAWL_API_KEY=          # Firecrawl for JS-rendered page scraping
SPECTER_API_KEY=            # Specter for structured company/person data
LINKEDIN_LI_AT=             # LinkedIn Voyager session cookie
LINKEDIN_JSESSIONID=        # LinkedIn Voyager session cookie
NOTION_API_KEY=             # Notion integration (output stub — see outputs/notion.ts)

# Backend selection
LLM_BACKEND=cli             # "cli" (claude --print subprocess) or "api" (Anthropic SDK)
OUTPUT_BACKEND=markdown     # "markdown" | "csv" | "notion"

Configuration

Budget caps are set in src/config.ts:

export const BUDGET_DEFAULTS = {
  llmCalls: 20,
  serpQueries: 50,
  firecrawlCredits: 40,
  linkedinRequests: 10,
  gapLoops: 2,
};

Output adapters live in src/blocks/outputs/:

markdown.ts — default; writes report.md, data.json, sources.md
csv.ts — flat CSV of enriched profiles
notion.ts — stub; falls back to markdown until NOTION_API_KEY is set and the adapter is implemented

Output

Results are written to results/<task-slug>/:

report.md — main narrative report (LLM-synthesized)
data.json — raw structured data for all enriched profiles
sources.md — full source attribution with URLs

Status

Early — extracted and generalized from a production system. Core pipeline (coordinator, workers, blocks) is functional. Specter and LinkedIn adapters require paid API access. Notion output adapter is a stub.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

research-agent

Why use it

Architecture

Quickstart

Example tasks

Environment variables

Configuration

Output

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

research-agent

Why use it

Architecture

Quickstart

Example tasks

Environment variables

Configuration

Output

Status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages