Skip to content

zeroentropy-ai/auto-optimize

Repository files navigation

AutoOptimize

AI agent teams race to approximate a mystery function, each using a different embedding model to search 1,468 ArXiv papers for numerical methods techniques.

Built with Mastra, the Vercel AI SDK, and Next.js.


What It Does

Three teams of AI agents compete to approximate an unknown function f(x) on [-5, 5]. Each team gets 64 sample points and must build the best approximation — scored on accuracy x speed. The teams search the same corpus of 1,468 numerical methods papers, but each uses a different embedding model for retrieval:

Team Embedding Model Max Iterations Agents
ZeroEntropy zembed-1 (2560-dim) 64 3
OpenAI text-embedding-3-small (1536-dim) 64 3
Cohere embed-english-v3.0 (1024-dim) 64 3

Same corpus, same LLM, same iteration budget, same scoring — the only variable is which embedding model powers the search.

The dashboard shows the race live with score charts and function approximation plots updating in real time.

The Mystery Function

f(x) = sin(x^2 * 2 + 3x) * exp(-0.08x^2)       // chirp — frequency sweeps with x
     + 2/(1 + 100*(x-1.7)^2)                     // sharp spike — Runge-like pole
     + 0.4*|sin(3x)|                              // periodic kinks — non-differentiable
     + 1.5*exp(-8*(x+3)^2)*cos(25x)              // localized burst — high-freq near x=-3
     + 0.6*tanh(15*(x-3.5))                       // steep step — sigmoid transition

This function is designed to be hard: it combines features that break naive polynomial interpolation. Agents need to discover techniques like Chebyshev node placement, barycentric interpolation, and piecewise/rational methods from the retrieved papers.

Scoring

digits = -log10(mean_absolute_error)    // averaged over 10,001 test points, capped [0, 15]
speed  = ops_per_second / baseline      // relative to naive 64-point Lagrange interpolation
score  = digits * sqrt(speed)

How It Works

Data Pipeline (run ahead of time)

  1. download_arxiv.py — Fetches metadata for 1,468 papers on numerical methods from the ArXiv API
  2. download_full_papers.py — Downloads full LaTeX source for each paper (1,345 succeeded)
  3. src/scripts/build-chunks.ts — Chunks papers at newline boundaries into ~1500-char segments (69,602 chunks)
  4. src/scripts/pre-embed.ts / embed_ze_modal.py — Embeds all chunks with all 3 providers, saves to binary files on disk

Runtime (during the demo)

  1. Server loads 69,602 pre-computed embeddings per provider into memory (~1.4 GB total)
  2. User clicks "Start Race"
  3. For each of 3 providers, 3 Mastra agents launch in parallel (9 agents total)
  4. Each agent loop: generate() -> LLM calls tools (maxSteps=5) -> searchPapers (embed query, cosine similarity, return top-5 chunks) -> evalCode (run candidate function, measure accuracy + speed) -> repeat
  5. Agents within a team share a notebook: best code so far, findings from searches, scores
  6. Dashboard polls /api/status every 1s and /api/plot every 3s
  7. Score chart shows best-so-far (solid lines) and raw iteration scores (dotted lines) per team
  8. Function plots show ground truth vs. each team's best approximation

Project Structure

auto-optimize/
├── corpus/
│   ├── arxiv_papers.jsonl           # 1,468 paper metadata (title, abstract, authors)
│   ├── papers/                      # 1,345 full LaTeX source files
│   ├── chunks_medium.json           # 69,602 text chunks
│   └── embeddings/
│       ├── zeroentropy_embeddings.bin
│       ├── openai_embeddings.bin
│       └── cohere_embeddings.bin
│
├── src/
│   ├── app/                         # Next.js app
│   │   ├── page.tsx                 # Server wrapper
│   │   ├── dashboard.tsx            # Main dashboard UI (client component)
│   │   ├── layout.tsx               # HTML shell, fonts, theme
│   │   └── api/
│   │       ├── race/route.ts        # POST = start race, DELETE = reset
│   │       ├── status/route.ts      # GET = poll race state
│   │       ├── eval/route.ts        # POST = evaluate code standalone
│   │       └── plot/route.ts        # GET = function plot data
│   │
│   ├── lib/                         # Core logic
│   │   ├── eval.ts                  # Evaluation harness — mystery function, scoring
│   │   ├── corpus.ts                # Loads chunks from chunks_medium.json
│   │   ├── vectorstore.ts           # Embedding + cosine similarity search
│   │   ├── race-state.ts            # In-memory race state management
│   │   └── race-runner.ts           # Orchestrator: 3 teams x 3 agents
│   │
│   ├── mastra/                      # Mastra agent framework
│   │   ├── index.ts                 # Mastra instance with 3 agents
│   │   ├── agents/
│   │   │   └── optimizer.ts         # Agent definition + system prompt
│   │   └── tools/
│   │       ├── search-papers.ts     # Tool: semantic search over paper corpus
│   │       └── eval-code.ts         # Tool: evaluate an approximation function
│   │
│   └── scripts/
│       ├── build-chunks.ts          # Chunk papers into segments
│       ├── pre-embed.ts             # Embed chunks with all 3 providers
│       └── test-eval.ts             # Quick test of the evaluation harness
│
├── download_arxiv.py                # Fetch paper metadata from ArXiv
├── download_full_papers.py          # Download full LaTeX/PDF source
├── embed_ze_modal.py                # ZeroEntropy embedding via Modal
├── server.ts                        # Standalone Express server (alternative to Next.js)
├── next.config.js
├── tsconfig.json
└── package.json

Running It

Prerequisites

  • Node.js 20+
  • API keys in .env:
    • GOOGLE_GENERATIVE_AI_API_KEY — Gemini (used by all agents)
    • ZEROENTROPY_API_KEY
    • OPENAI_API_KEY
    • COHERE_API_KEY

Quick Start

# Download corpus data and pre-computed embeddings (~1.5 GB)
./download.sh

npm install
npm run build
NODE_OPTIONS="--max-old-space-size=8192" npx next start -p 3000 -H 0.0.0.0

Then open http://localhost:3000 and click "Start Race".

Rebuilding Embeddings

If you need to re-embed (e.g., after changing chunks):

# ZeroEntropy (via Modal for parallelism)
python3 embed_ze_modal.py

# OpenAI + Cohere (via Vercel AI SDK)
npx tsx src/scripts/pre-embed.ts

The embedding scripts are idempotent — they resume from where they left off.


Tech Stack

  • Mastra — TypeScript AI agent framework (agent definitions, tool calling)
  • Vercel AI SDK — Model routing (Gemini via @ai-sdk/google), embedding provider pattern
  • Next.js 16 — Server + frontend (API routes + React dashboard)
  • Gemini 3 Flash Preview — LLM for all agents (same model across all teams)

FAQ

What LLM are the agents using? Gemini 3 Flash Preview, same for all teams. The LLM is not the variable — retrieval is.

Why 64 sample points? Enough for a good approximation with the right technique, but not enough to brute-force it. Agents need to discover optimal node placement and stable evaluation methods from the papers.

How is scoring done? score = digits * sqrt(speed), where digits = -log10(mean_absolute_error) capped at [0, 15], and speed = ops/sec relative to a naive 64-point Lagrange baseline. Both accuracy and efficiency matter, but accuracy dominates.

License

Apache-2.0

About

AI agent teams race to approximate a mystery function using different embedding models for paper retrieval

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors