An Open Source Typescript implementation of Programmatic Tool Calling for AI Agents.
Codecall changes how agents interact with tools by letting them write and execute code (in sandboxes) that orchestrates multiple tool calls programmatically to do a task, rather than making individual tool calls that bloat the context and increase the token usage like in traditional agents
This works with both MCP servers (http streaming, stdio) and standard tool definitions.
Note
Before reading :)
Please keep in mind all of this is the future plan for Codecall and how it will work. Codecall is still a WIP and not production ready.
If you're interested in contributing or following the project, check back soon or open an issue to discuss ideas!
Traditional tool calling has fundamental architectural issues that get worse at scale:
Traditional agents send EVERY tool definition with every request, so for 20 tools thats about 10k+ tokens of schema definitions in every inference call, even for questions like, "what can you do?" or "update the date for this" where they are not necessary. This issue scales with tool count and gets multiplied by every step within a turn AND every turn in the conversation.
Every tool operation requires a full inference round-trip, so "Delete all completed tasks" becomes: LLM calls findTasks, waits, calls deleteTask for task 1, waits, calls for task 2... Each call resends the entire conversation history including all previous tool results, so the tokens compound. For example those 20 steps = 20+ inference calls with exponentially growing context.
Similar to #2, but traditional agents execute tools sequentially even when operations are independent. Ten API calls that could run simultaneously instead happen one at a time with the agent reasoning between each of them that wastes time, tokens, and unnecessarily sends info into the context window and through the API provider's infrastructure.
Benchmarks show models have a 10-50% failure rate when searching through large datasets in context. They hallucinate field names, miss entries, and get confused by similar data.
But doing this programmatically fixes this because it can just write code, as its deterministic so 0% failure rate
users.filter((u) => u.role === "admin");Let models do what they're good at: writing code.
LLMs have enormous amounts of real-world TypeScript in their training data. They're significantly better at writing code to call APIs than they are at the arbitrary JSON matching that tool calling requires.
Codecall ALSO only has 2 tools: readFile and executeCode, along with our SDK file tree of your tools (that we generate ahead of time) in context, so the agent only reads and gets the context it needs as needed based on the task, so this makes a 30 tool setup effectively have the same base context as a 5 tool setup (only the file tree gets larger)
// Instead of 20+ inference passes and 90k+ tokens:
const allUsers = await tools.users.listAllUsers();
const adminUsers = allUsers.filter((u) => u.role === "admin");
const resources = await tools.resources.getSensitiveResources();
progress({
step: "Data loaded",
admins: adminUsers.length,
resources: resources.length,
});
const revokedAccesses = [];
const failedAccesses = [];
for (const admin of adminUsers) {
for (const resource of resources) {
try {
const result = await tools.permissions.revokeAccess({
userId: admin.id,
resourceId: resource.id,
});
if (result.success) {
revokedAccesses.push({ admin: admin.name, resource: resource.name });
}
} catch (err) {
failedAccesses.push({
admin: admin.name,
resource: resource.name,
error: err.message,
});
}
}
}
return {
totalAdmins: adminUsers.length,
resourcesAffected: resources.length,
accessesRevoked: revokedAccesses.length,
accessesFailed: failedAccesses.length,
};Two inference passes. The code runs in a sandbox calling all 20 updates programmatically with step by step updates, only pulling the relevant context when it is needed, and returning only what is necessary. Saving tens of thousands worth of tokens, and doing everything more efficiently.
Codecall gives the model 2 tools + a file tree to work with so the model still controls the entire flow that decides what to read, what code to write, when to execute, and how to respond... so everything stays fully agentic.
Instead of exposing every tool directly to the LLM for it to call, Codecall:
- Converts your MCP definitions into TypeScript SDK files (types + function signatures)
- Shows the model a directory tree of available files
- Allows the model to selectively read SDK files to understand types and APIs
- Lets the model write code to accomplish the task
- Executes that code in a deno sandbox with access to your actual tools as functions
- Returns the execution result back (success/error)
- Lets the model produce a respond or continue
The system message by default has an SDK file tree showing all available tools as files, it just shows them the file tree and not the actual contents of each of the files, so it can progressively discover tools as it needs them for a certain task
Example:
tools/
├─ Database
│ ├─ checkEmailExists.ts
│ ├─ cloneUser.ts
│ ├─ createUser.ts
│ ├─ deactivateUsersByDomain.ts
│ ├─ deleteUser.ts
│ ├─ getUsersByFavoriteColor.ts
│ ├─ getUsersCreatedAfter.ts
│ ├─ getUserStats.ts
│ ├─ searchUsers.ts
│ ├─ setUserActiveStatus.ts
│ ├─ setUserFavoriteColor.ts
│ ├─ updateUser.ts
│ └─ validateEmailFormat.ts
└─ todoist
├─ addComments.ts
├─ addProjects.ts
├─ addSections.ts
├─ addTasks.ts
├─ manageAssignments.ts
├─ search.ts
├─ updateComments.ts
├─ updateProjects.ts
├─ updateSections.ts
├─ updateTasks.ts
└─ userInfo.ts
Returns the full contents of a specific SDK file, including type definitions, function signatures, and schemas.
Example:
readFile({ path: "tools/users/listAllUsers.ts" }); ->
/**
* HOW TO CALL THIS TOOL:
* await tools.users.listAllUsers({ limit: 100, offset: 0 })
*
* This is the ONLY way to invoke this tool in your code.
*/
export interface ListAllUsersInput {
limit?: number;
offset?: number;
}
export interface User {
id: string;
name: string;
email: string;
role: "admin" | "user" | "guest";
department: string;
createdAt: string;
}
export async function listAllUsers(input: ListAllUsersInput): Promise<User[]>;Executes TypeScript code in a Deno sandbox. Returns either the successful output or an error w/ the execution trace.
Example:
executeCode(`
const users = await tools.users.listAllUsers({ limit: 100 });
return users.filter(u => u.role === "admin");
`);Success returns:
{
status: "success",
output: [
{ id: "1", name: "Alice", role: "admin", ... },
{ id: "2", name: "Bob", role: "admin", ... }
],
progressLogs: [{ step: "Loading users..." }]
}Error returns:
{
status: "error",
error: `=== ERROR ===
Type: Error
Message: Undefined value at 'result[0]'. This usually means you accessed a property that doesn't exist.
=== STACK TRACE ===
Error: Undefined value at 'result[0]'...
at validateResult (file:///.../sandbox.ts:68:11)
at file:///.../sandbox.ts:99:5
progressLogs: [{ step: "Loading users..." }]
}The error includes the full stack trace, giving the model maximum context to fix the issue and try again, then update the SDK file once fixed.
When the model calls executeCode(), Codecall runs that code inside a fresh, short-lived Deno sandbox. Each sandbox is spun up using Deno and runs the code in isolation. Deno’s security model blocks access to sensitive capabilities unless explicitly allowed.
By default, the sandboxed code has no access to the filesystem, network, environment variables, or system processes. The only way it can interact with the outside world is by calling the tool functions exposed through tools (which are forwarded by Codecall to the MCP server).
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ SPAWN │────--▶│ INJECT │────▶│ EXECUTE │────▶│ CAPTURE │────▶│ DESTROY │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ Fresh Deno tools proxy Run generated Collect return Terminate │
│ process with + progress() TypeScript value or error process, │
│ deny-by-default injected code + progress logs cleanup │
│ (Deno 2) │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ │
│ SANDBOX TOOL BRIDGE MCP SERVER │
│ │ │ │ │
│ │ tools.users.listAllUsers() │ │ │
│ │ ─────────────────────────────▶│ │ │
│ │ │ │ │
│ │ │ tools/call: listAllUsers │ │
│ │ │ ──────────────────────────────────▶│ │
│ │ │ │ │
│ │ │ [{ id, name, role }, ...] │ │
│ │ │ ◀──────────────────────────────────│ │
│ │ │ │ │
│ │ Promise<User[]> resolved │ │ │
│ │ ◀─────────────────────────────│ │ │
│ │ │ │ │
│ │ (code continues execution) │ │ │
│ │ │ │ │
│ │ progress({ step: "Done" }) │ │ │
│ │ ─────────────────────────────▶│ │ │
│ │ │ │ │
│ │ Streams to UI │ │
│ │ │ │ │
│ │ return { success: true } │ │ │
│ │ ─────────────────────────────▶│ │ │
│ │ │ │ │
│ │ Result sent to Model │ │
│ │ for response generation │ │
│ │ │ │ │
│ ▼ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
When the generated code runs, Codecall injects a tools Proxy into the sandbox.
toolsis not a set of local functions, but a Proxy that intercepts all property access- Each call like
tools.namespace.method(args)sends a JSON message via IPC to the host - The host's
ToolRegistryroutes the call to the correct handler (MCP server or internal function) - Results are sent back via IPC, and the Promise resolves in the sandbox
So when the the model calls executeCode() w/ tools:
const result = await tools.permissions.revokeAccess({
userId: admin.id,
resourceId: resource.id,
reason: "security-audit",
});What actually happens is:
- The sandbox's
toolsProxy intercepts the call and sends a JSON message to stdout:{ type: "call", tool: "permissions.revokeAccess", args: {...} } - The host process (Node.js) receives this via IPC and routes it through the
ToolRegistry - The
ToolRegistrylooks up the handler (MCP connection or internal function) and executes it - The result is sent back to the sandbox via stdin:
{ id: 1, result: {...} } - The sandbox resolves the Promise and code continues running
From the code's perspective this behaves exactly like calling a normal async function.
As mentioned above, Codecall converts MCP tool definitions into TypeScript SDK files so it's clearer to reference when writing code. So when you connect an MCP server, Codecall:
- Extracts tool definitions - Reads all tools from the MCP server, including their
inputSchema,outputSchema, descriptions, annotations (readOnly, destructive, idempotent), and execution hints - Generates TypeScript SDK files - Uses Gemini 3 Flash to the convert JSON Schema definitions into well-typed TypeScript files with:
- Complete type definitions for inputs and outputs
- JSDoc comments with descriptions, defaults, and validation constraints
- Proper handling of enums, optional fields, and nested objects
- Success/error response types when output schemas are provided
- Organizes by namespace - Groups tools into folders (e.g.,
tools/database/,tools/todoist/) based on the MCP server name - Writes to disk - Saves all SDK files to
generatedSdks/tools/{namespace}/for the agent to discover and read on-demand
This approach makes sure that the agent sees clean, well-typed TypeScript interfaces and schemas instead of raw JSON schemas, making it easier for the model to write correct code. The SDK files are also self-documenting with JSDoc comments that capture all the metadata from the original tool definitions, and that it can be edited in the future.
Codecall also has a self-learning system that automatically improves the SDK documentation we generate when the agents recover from tool call errors. Unlike traditional agents that repeat the same mistakes across sessions, Codecall builds memory for what not to do that compounds over time by updating the actual SDK Files.
Because Codecall writes code that strings together multiple tool calls into a single script to do the user's task, it becomes a lot more important for the tools not to fail, because even small input schema issue, or assuming the output shape it different, or guessing the wrong semantics would cause the entire script to fail, and for the agent to re-write it and fix it.
First agent makes a mistake:
// Agent calls: tools.todoist.findTasks({ })
// Error: "At least one filter must be provided..."After fixing the issue in that run, the agent automatically updates the SDK file that was unclear (which led to the issue) with this at the top:
/**
* ╔════════════════════════════════════════════════════════════════════════════╗
* ║ @CC LEARNED CONSTRAINT ║
* ║ The `findTasks` tool requires "At least one filter must be provided: ║
* ║ searchText, projectId, sectionId, parentId, responsibleUser, or ║
* ║ labels" - you cannot call it with only `responsibleUserFiltering` or ║
* ║ `limit`. To get all tasks, you must iterate through all projects ║
* ║ using `projectId` as the required filter, or provide a non-empty ║
* ║ `searchText`. ║
* ╚════════════════════════════════════════════════════════════════════════════╝
*/Next agent reads that same SDK file, and sees the banner immediately:
// Agent reads: tools/todoist/findTasks.ts
// Sees the @CC LEARNED CONSTRAINT banner at the top
// Writes correct code from the start:
const tasks = await tools.todoist.findTasks({ projectId: "12345" });So no error, no retry, and no wasted inference. The learned constraint prevents the same mistake entirely.
The model uses progress() to provide real time feedback while a script in being executed. This gives users visibility into what's happening without requiring multiple executeCode() calls like normal tools calls.
The sandbox uses stdout as an IPC channel and not a log stream, so each line is parsed as a JSON and routed based on its type field. A normal console.log("hi") isn't valid protocol JSON, so the sandbox ignores it.
progress(data) wraps your data in the correct format ({ type: "progress", data }) so it gets captured, stored in progressLogs, and forwarded to the onProgress callback.
const users = await tools.users.listAllUsers();
progress({ step: "Loaded users", count: users.length });
const admins = users.filter((u) => u.role === "admin");
progress({ step: "Filtered admins", count: admins.length });
for (let i = 0; i < admins.length; i++) {
await tools.permissions.revokeAccess({ userId: admins[i].id });
if ((i + 1) % 10 === 0) {
progress({ step: "Revoking", processed: i + 1, total: admins.length });
}
}
progress({ step: "Complete", revoked: admins.length });
return { adminsProcessed: admins.length };This keeps the UX of a step-by-step agent with user-facing updates while still getting the cost and speed benefits of single-pass execution.
Benchmarks show Claude Opus 4.1 performs:
- 42.3% on Python
- 47.7% on TypeScript
That's a 12% improvement just from language choice, and various other models show the same pattern.
TypeScript also gives you:
- Full type inference for SDK generation
- Compile time validation of tool schemas
- The model sees types and can use them correctly
- MCP Client - connect to MCP servers via stdio/HTTP
- Tool Registry - unified routing for MCP + internal tools
- Generate SDK Files - For every tool, generate a well typed SDK file w/ the input & output schemas
- Deno Sandbox - isolated TypeScript execution with an IPC tool bridge
- Tools Proxy - intercept
tools.namespace.method()calls in the deno sandbox - Progress Streaming - the Real time
onProgresscallback support is working - Handling Errors - the entire stack traces + numbered code on failure returns on error
- Result Validation - Catch undefined values (property access errors) similar to the above
- Add internal tools - Expose the
readFileandexecuteCodetools to Agent - Normal agent loop - handle LLM messages and tool calls w/ streaming, just a normal agent loop w/ open router
- System prompt - guide the LLM to explore SDK files, write code and etc
- [] add warning for destrtive tools in code scripts the user can type y/n if they want to continue
- Side By Side - Using the same set of tools and a task, do a direct comparison w/ code call and a traditional agent
- Documentation - docs and usage examples
- NPM Package - an npm package for codecall (down the road)
We welcome contributions! Please Feel free to:
- Open issues for bugs or feature requests
- Submit PRs for improvements
- Share your use cases and feedback
This project builds on ideas from the community and is directly inspired by:
- Yannic Kilcher – What Cloudflare's code mode misses about MCP and tool calling
- Theo – Anthropic admits that MCP sucks & Anthropic is trying SO hard to fix MCP...
- Boundary - Using MCP server with 10000+ tools: 🦄 Ep #7
- Cloudflare – Code mode: the better way to use MCP
- Anthropic – Code execution with MCP: building more efficient AI agents & Introducing advanced tool use
- Medium - Your Agent Is Wasting Money On Tools. Code Execution With MCP Fixes It.
MIT