AutoOptimize

AI agent teams race to approximate a mystery function, each using a different embedding model to search 1,468 ArXiv papers for numerical methods techniques.

Built with Mastra, the Vercel AI SDK, and Next.js.

What It Does

Three teams of AI agents compete to approximate an unknown function f(x) on [-5, 5]. Each team gets 64 sample points and must build the best approximation — scored on accuracy x speed. The teams search the same corpus of 1,468 numerical methods papers, but each uses a different embedding model for retrieval:

Team	Embedding Model	Max Iterations	Agents
ZeroEntropy	zembed-1 (2560-dim)	64	3
OpenAI	text-embedding-3-small (1536-dim)	64	3
Cohere	embed-english-v3.0 (1024-dim)	64	3

Same corpus, same LLM, same iteration budget, same scoring — the only variable is which embedding model powers the search.

The dashboard shows the race live with score charts and function approximation plots updating in real time.

The Mystery Function

f(x) = sin(x^2 * 2 + 3x) * exp(-0.08x^2)       // chirp — frequency sweeps with x
     + 2/(1 + 100*(x-1.7)^2)                     // sharp spike — Runge-like pole
     + 0.4*|sin(3x)|                              // periodic kinks — non-differentiable
     + 1.5*exp(-8*(x+3)^2)*cos(25x)              // localized burst — high-freq near x=-3
     + 0.6*tanh(15*(x-3.5))                       // steep step — sigmoid transition

This function is designed to be hard: it combines features that break naive polynomial interpolation. Agents need to discover techniques like Chebyshev node placement, barycentric interpolation, and piecewise/rational methods from the retrieved papers.

Scoring

digits = -log10(mean_absolute_error)    // averaged over 10,001 test points, capped [0, 15]
speed  = ops_per_second / baseline      // relative to naive 64-point Lagrange interpolation
score  = digits * sqrt(speed)

How It Works

Data Pipeline (run ahead of time)

download_arxiv.py — Fetches metadata for 1,468 papers on numerical methods from the ArXiv API
download_full_papers.py — Downloads full LaTeX source for each paper (1,345 succeeded)
src/scripts/build-chunks.ts — Chunks papers at newline boundaries into ~1500-char segments (69,602 chunks)
src/scripts/pre-embed.ts / embed_ze_modal.py — Embeds all chunks with all 3 providers, saves to binary files on disk

Runtime (during the demo)

Server loads 69,602 pre-computed embeddings per provider into memory (~1.4 GB total)
User clicks "Start Race"
For each of 3 providers, 3 Mastra agents launch in parallel (9 agents total)
Each agent loop: generate() -> LLM calls tools (maxSteps=5) -> searchPapers (embed query, cosine similarity, return top-5 chunks) -> evalCode (run candidate function, measure accuracy + speed) -> repeat
Agents within a team share a notebook: best code so far, findings from searches, scores
Dashboard polls /api/status every 1s and /api/plot every 3s
Score chart shows best-so-far (solid lines) and raw iteration scores (dotted lines) per team
Function plots show ground truth vs. each team's best approximation

Project Structure

auto-optimize/
├── corpus/
│   ├── arxiv_papers.jsonl           # 1,468 paper metadata (title, abstract, authors)
│   ├── papers/                      # 1,345 full LaTeX source files
│   ├── chunks_medium.json           # 69,602 text chunks
│   └── embeddings/
│       ├── zeroentropy_embeddings.bin
│       ├── openai_embeddings.bin
│       └── cohere_embeddings.bin
│
├── src/
│   ├── app/                         # Next.js app
│   │   ├── page.tsx                 # Server wrapper
│   │   ├── dashboard.tsx            # Main dashboard UI (client component)
│   │   ├── layout.tsx               # HTML shell, fonts, theme
│   │   └── api/
│   │       ├── race/route.ts        # POST = start race, DELETE = reset
│   │       ├── status/route.ts      # GET = poll race state
│   │       ├── eval/route.ts        # POST = evaluate code standalone
│   │       └── plot/route.ts        # GET = function plot data
│   │
│   ├── lib/                         # Core logic
│   │   ├── eval.ts                  # Evaluation harness — mystery function, scoring
│   │   ├── corpus.ts                # Loads chunks from chunks_medium.json
│   │   ├── vectorstore.ts           # Embedding + cosine similarity search
│   │   ├── race-state.ts            # In-memory race state management
│   │   └── race-runner.ts           # Orchestrator: 3 teams x 3 agents
│   │
│   ├── mastra/                      # Mastra agent framework
│   │   ├── index.ts                 # Mastra instance with 3 agents
│   │   ├── agents/
│   │   │   └── optimizer.ts         # Agent definition + system prompt
│   │   └── tools/
│   │       ├── search-papers.ts     # Tool: semantic search over paper corpus
│   │       └── eval-code.ts         # Tool: evaluate an approximation function
│   │
│   └── scripts/
│       ├── build-chunks.ts          # Chunk papers into segments
│       ├── pre-embed.ts             # Embed chunks with all 3 providers
│       └── test-eval.ts             # Quick test of the evaluation harness
│
├── download_arxiv.py                # Fetch paper metadata from ArXiv
├── download_full_papers.py          # Download full LaTeX/PDF source
├── embed_ze_modal.py                # ZeroEntropy embedding via Modal
├── server.ts                        # Standalone Express server (alternative to Next.js)
├── next.config.js
├── tsconfig.json
└── package.json

Running It

Prerequisites

Node.js 20+
API keys in .env:
- GOOGLE_GENERATIVE_AI_API_KEY — Gemini (used by all agents)
- ZEROENTROPY_API_KEY
- OPENAI_API_KEY
- COHERE_API_KEY

Quick Start

# Download corpus data and pre-computed embeddings (~1.5 GB)
./download.sh

npm install
npm run build
NODE_OPTIONS="--max-old-space-size=8192" npx next start -p 3000 -H 0.0.0.0

Then open http://localhost:3000 and click "Start Race".

Rebuilding Embeddings

If you need to re-embed (e.g., after changing chunks):

# ZeroEntropy (via Modal for parallelism)
python3 embed_ze_modal.py

# OpenAI + Cohere (via Vercel AI SDK)
npx tsx src/scripts/pre-embed.ts

The embedding scripts are idempotent — they resume from where they left off.

Tech Stack

Mastra — TypeScript AI agent framework (agent definitions, tool calling)
Vercel AI SDK — Model routing (Gemini via @ai-sdk/google), embedding provider pattern
Next.js 16 — Server + frontend (API routes + React dashboard)
Gemini 3 Flash Preview — LLM for all agents (same model across all teams)

FAQ

What LLM are the agents using? Gemini 3 Flash Preview, same for all teams. The LLM is not the variable — retrieval is.

Why 64 sample points? Enough for a good approximation with the right technique, but not enough to brute-force it. Agents need to discover optimal node placement and stable evaluation methods from the papers.

How is scoring done? score = digits * sqrt(speed), where digits = -log10(mean_absolute_error) capped at [0, 15], and speed = ops/sec relative to a naive 64-point Lagrange baseline. Both accuracy and efficiency matter, but accuracy dominates.

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download.sh		download.sh
download_arxiv.py		download_arxiv.py
download_full_papers.py		download_full_papers.py
next-env.d.ts		next-env.d.ts
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoOptimize

What It Does

The Mystery Function

Scoring

How It Works

Data Pipeline (run ahead of time)

Runtime (during the demo)

Project Structure

Running It

Prerequisites

Quick Start

Rebuilding Embeddings

Tech Stack

FAQ

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoOptimize

What It Does

The Mystery Function

Scoring

How It Works

Data Pipeline (run ahead of time)

Runtime (during the demo)

Project Structure

Running It

Prerequisites

Quick Start

Rebuilding Embeddings

Tech Stack

FAQ

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages