Skip to content

An Open Source Typescript implementation of Programmatic Tool Calling for AI Agents (Based on Code Mode))

License

Notifications You must be signed in to change notification settings

zeke-john/codecall

Repository files navigation

Codecall

An Open Source Typescript implementation of Programmatic Tool Calling for AI Agents.

Codecall changes how agents interact with tools by letting them write and execute code (in sandboxes) that orchestrates multiple tool calls programmatically to do a task, rather than making individual tool calls that bloat the context and increase the token usage like in traditional agents

This works with both MCP servers (http streaming, stdio) and standard tool definitions.

Note

Before reading :)

Please keep in mind all of this is the future plan for Codecall and how it will work. Codecall is still a WIP and not production ready.

If you're interested in contributing or following the project, check back soon or open an issue to discuss ideas!

The Problem

Traditional tool calling has fundamental architectural issues that get worse at scale:

1. Context Bloat & Wasting Tokens

Traditional agents send EVERY tool definition with every request, so for 20 tools thats about 10k+ tokens of schema definitions in every inference call, even for questions like, "what can you do?" or "update the date for this" where they are not necessary. This issue scales with tool count and gets multiplied by every step within a turn AND every turn in the conversation.

2. N Inference Calls for N Tool Operations

Every tool operation requires a full inference round-trip, so "Delete all completed tasks" becomes: LLM calls findTasks, waits, calls deleteTask for task 1, waits, calls for task 2... Each call resends the entire conversation history including all previous tool results, so the tokens compound. For example those 20 steps = 20+ inference calls with exponentially growing context.

3. No Parallel Execution

Similar to #2, but traditional agents execute tools sequentially even when operations are independent. Ten API calls that could run simultaneously instead happen one at a time with the agent reasoning between each of them that wastes time, tokens, and unnecessarily sends info into the context window and through the API provider's infrastructure.

4. Models are not great at Lookup

Benchmarks show models have a 10-50% failure rate when searching through large datasets in context. They hallucinate field names, miss entries, and get confused by similar data.

But doing this programmatically fixes this because it can just write code, as its deterministic so 0% failure rate

users.filter((u) => u.role === "admin");

The Solution

Let models do what they're good at: writing code.

LLMs have enormous amounts of real-world TypeScript in their training data. They're significantly better at writing code to call APIs than they are at the arbitrary JSON matching that tool calling requires.

Codecall ALSO only has 2 tools: readFile and executeCode, along with our SDK file tree of your tools (that we generate ahead of time) in context, so the agent only reads and gets the context it needs as needed based on the task, so this makes a 30 tool setup effectively have the same base context as a 5 tool setup (only the file tree gets larger)

// Instead of 20+ inference passes and 90k+ tokens:
const allUsers = await tools.users.listAllUsers();
const adminUsers = allUsers.filter((u) => u.role === "admin");
const resources = await tools.resources.getSensitiveResources();

progress({
  step: "Data loaded",
  admins: adminUsers.length,
  resources: resources.length,
});

const revokedAccesses = [];
const failedAccesses = [];

for (const admin of adminUsers) {
  for (const resource of resources) {
    try {
      const result = await tools.permissions.revokeAccess({
        userId: admin.id,
        resourceId: resource.id,
      });
      if (result.success) {
        revokedAccesses.push({ admin: admin.name, resource: resource.name });
      }
    } catch (err) {
      failedAccesses.push({
        admin: admin.name,
        resource: resource.name,
        error: err.message,
      });
    }
  }
}

return {
  totalAdmins: adminUsers.length,
  resourcesAffected: resources.length,
  accessesRevoked: revokedAccesses.length,
  accessesFailed: failedAccesses.length,
};

Two inference passes. The code runs in a sandbox calling all 20 updates programmatically with step by step updates, only pulling the relevant context when it is needed, and returning only what is necessary. Saving tens of thousands worth of tokens, and doing everything more efficiently.

How Codecall Works

Codecall gives the model 2 tools + a file tree to work with so the model still controls the entire flow that decides what to read, what code to write, when to execute, and how to respond... so everything stays fully agentic.

Instead of exposing every tool directly to the LLM for it to call, Codecall:

  • Converts your MCP definitions into TypeScript SDK files (types + function signatures)
  • Shows the model a directory tree of available files
  • Allows the model to selectively read SDK files to understand types and APIs
  • Lets the model write code to accomplish the task
  • Executes that code in a deno sandbox with access to your actual tools as functions
  • Returns the execution result back (success/error)
  • Lets the model produce a respond or continue

The system message by default has an SDK file tree showing all available tools as files, it just shows them the file tree and not the actual contents of each of the files, so it can progressively discover tools as it needs them for a certain task

Example:

tools/
├─ Database
│  ├─ checkEmailExists.ts
│  ├─ cloneUser.ts
│  ├─ createUser.ts
│  ├─ deactivateUsersByDomain.ts
│  ├─ deleteUser.ts
│  ├─ getUsersByFavoriteColor.ts
│  ├─ getUsersCreatedAfter.ts
│  ├─ getUserStats.ts
│  ├─ searchUsers.ts
│  ├─ setUserActiveStatus.ts
│  ├─ setUserFavoriteColor.ts
│  ├─ updateUser.ts
│  └─ validateEmailFormat.ts
└─ todoist
   ├─ addComments.ts
   ├─ addProjects.ts
   ├─ addSections.ts
   ├─ addTasks.ts
   ├─ manageAssignments.ts
   ├─ search.ts
   ├─ updateComments.ts
   ├─ updateProjects.ts
   ├─ updateSections.ts
   ├─ updateTasks.ts
   └─ userInfo.ts

1. readFile(path: string)

Returns the full contents of a specific SDK file, including type definitions, function signatures, and schemas.

Example:

readFile({ path: "tools/users/listAllUsers.ts" }); ->

/**
 * HOW TO CALL THIS TOOL:
 * await tools.users.listAllUsers({ limit: 100, offset: 0 })
 *
 * This is the ONLY way to invoke this tool in your code.
 */

export interface ListAllUsersInput {
  limit?: number;
  offset?: number;
}

export interface User {
  id: string;
  name: string;
  email: string;
  role: "admin" | "user" | "guest";
  department: string;
  createdAt: string;
}

export async function listAllUsers(input: ListAllUsersInput): Promise<User[]>;

2. executeCode(code: string)

Executes TypeScript code in a Deno sandbox. Returns either the successful output or an error w/ the execution trace.

Example:

executeCode(`
  const users = await tools.users.listAllUsers({ limit: 100 });
  return users.filter(u => u.role === "admin");
`);

Success returns:

{
  status: "success",
  output: [
    { id: "1", name: "Alice", role: "admin", ... },
    { id: "2", name: "Bob", role: "admin", ... }
  ],
  progressLogs: [{ step: "Loading users..." }]
}

Error returns:

{
  status: "error",
  error: `=== ERROR ===
Type: Error
Message: Undefined value at 'result[0]'. This usually means you accessed a property that doesn't exist.

=== STACK TRACE ===
Error: Undefined value at 'result[0]'...
    at validateResult (file:///.../sandbox.ts:68:11)
    at file:///.../sandbox.ts:99:5

progressLogs: [{ step: "Loading users..." }]
}

The error includes the full stack trace, giving the model maximum context to fix the issue and try again, then update the SDK file once fixed.

Code Execution & Sandboxing

When the model calls executeCode(), Codecall runs that code inside a fresh, short-lived Deno sandbox. Each sandbox is spun up using Deno and runs the code in isolation. Deno’s security model blocks access to sensitive capabilities unless explicitly allowed.

By default, the sandboxed code has no access to the filesystem, network, environment variables, or system processes. The only way it can interact with the outside world is by calling the tool functions exposed through tools (which are forwarded by Codecall to the MCP server).

Sandbox Lifecycle

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│   ┌─────────┐       ┌─────────┐     ┌─────────┐     ┌─────────┐     ┌─────────┐         │
│   │  SPAWN  │────--▶│  INJECT │────▶│ EXECUTE │────▶│ CAPTURE │────▶│ DESTROY │         │
│   └─────────┘       └─────────┘     └─────────┘     └─────────┘     └─────────┘         │
│        │                 │               │               │               │              │
│        ▼                 ▼               ▼               ▼               ▼              │
│   Fresh Deno       tools proxy     Run generated    Collect return   Terminate          │
│   process with     + progress()    TypeScript       value or error   process,           │
│   deny-by-default  injected        code             + progress logs  cleanup            │
│   (Deno 2)                                                                              │
│                                                                                         │
└─────────────────────────────────────────────────────────────────────────────────────────┘

Data Flow

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                         │
│    SANDBOX                        TOOL BRIDGE                         MCP SERVER        │
│       │                               │                                    │            │
│       │  tools.users.listAllUsers()   │                                    │            │
│       │ ─────────────────────────────▶│                                    │            │
│       │                               │                                    │            │
│       │                               │   tools/call: listAllUsers         │            │
│       │                               │ ──────────────────────────────────▶│            │
│       │                               │                                    │            │
│       │                               │          [{ id, name, role }, ...] │            │
│       │                               │ ◀──────────────────────────────────│            │
│       │                               │                                    │            │
│       │   Promise<User[]> resolved    │                                    │            │
│       │ ◀─────────────────────────────│                                    │            │
│       │                               │                                    │            │
│       │  (code continues execution)   │                                    │            │
│       │                               │                                    │            │
│       │  progress({ step: "Done" })   │                                    │            │
│       │ ─────────────────────────────▶│                                    │            │
│       │                               │                                    │            │
│       │                          Streams to UI                             │            │
│       │                               │                                    │            │
│       │  return { success: true }     │                                    │            │
│       │ ─────────────────────────────▶│                                    │            │
│       │                               │                                    │            │
│       │                     Result sent to Model                           │            │
│       │                     for response generation                        │            │
│       │                               │                                    │            │
│                                       ▼                                                 │
└─────────────────────────────────────────────────────────────────────────────────────────┘

How Tool Calls Work at Runtime

When the generated code runs, Codecall injects a tools Proxy into the sandbox.

  • tools is not a set of local functions, but a Proxy that intercepts all property access
  • Each call like tools.namespace.method(args) sends a JSON message via IPC to the host
  • The host's ToolRegistry routes the call to the correct handler (MCP server or internal function)
  • Results are sent back via IPC, and the Promise resolves in the sandbox

So when the the model calls executeCode() w/ tools:

const result = await tools.permissions.revokeAccess({
  userId: admin.id,
  resourceId: resource.id,
  reason: "security-audit",
});

What actually happens is:

  • The sandbox's tools Proxy intercepts the call and sends a JSON message to stdout: { type: "call", tool: "permissions.revokeAccess", args: {...} }
  • The host process (Node.js) receives this via IPC and routes it through the ToolRegistry
  • The ToolRegistry looks up the handler (MCP connection or internal function) and executes it
  • The result is sent back to the sandbox via stdin: { id: 1, result: {...} }
  • The sandbox resolves the Promise and code continues running

From the code's perspective this behaves exactly like calling a normal async function.

Generating SDK's from Tool Definitions

As mentioned above, Codecall converts MCP tool definitions into TypeScript SDK files so it's clearer to reference when writing code. So when you connect an MCP server, Codecall:

  1. Extracts tool definitions - Reads all tools from the MCP server, including their inputSchema, outputSchema, descriptions, annotations (readOnly, destructive, idempotent), and execution hints
  2. Generates TypeScript SDK files - Uses Gemini 3 Flash to the convert JSON Schema definitions into well-typed TypeScript files with:
    • Complete type definitions for inputs and outputs
    • JSDoc comments with descriptions, defaults, and validation constraints
    • Proper handling of enums, optional fields, and nested objects
    • Success/error response types when output schemas are provided
  3. Organizes by namespace - Groups tools into folders (e.g., tools/database/, tools/todoist/) based on the MCP server name
  4. Writes to disk - Saves all SDK files to generatedSdks/tools/{namespace}/ for the agent to discover and read on-demand

This approach makes sure that the agent sees clean, well-typed TypeScript interfaces and schemas instead of raw JSON schemas, making it easier for the model to write correct code. The SDK files are also self-documenting with JSDoc comments that capture all the metadata from the original tool definitions, and that it can be edited in the future.

Progressive SDK Learning

Codecall also has a self-learning system that automatically improves the SDK documentation we generate when the agents recover from tool call errors. Unlike traditional agents that repeat the same mistakes across sessions, Codecall builds memory for what not to do that compounds over time by updating the actual SDK Files.

Because Codecall writes code that strings together multiple tool calls into a single script to do the user's task, it becomes a lot more important for the tools not to fail, because even small input schema issue, or assuming the output shape it different, or guessing the wrong semantics would cause the entire script to fail, and for the agent to re-write it and fix it.

The Flow

First agent makes a mistake:

// Agent calls: tools.todoist.findTasks({ })
// Error: "At least one filter must be provided..."

After fixing the issue in that run, the agent automatically updates the SDK file that was unclear (which led to the issue) with this at the top:

/**
 * ╔════════════════════════════════════════════════════════════════════════════╗
 * ║  @CC LEARNED CONSTRAINT                                                    ║
 * ║  The `findTasks` tool requires "At least one filter must be provided:      ║
 * ║  searchText, projectId, sectionId, parentId, responsibleUser, or           ║
 * ║  labels" - you cannot call it with only `responsibleUserFiltering` or      ║
 * ║  `limit`. To get all tasks, you must iterate through all projects          ║
 * ║  using `projectId` as the required filter, or provide a non-empty          ║
 * ║  `searchText`.                                                             ║
 * ╚════════════════════════════════════════════════════════════════════════════╝
 */

Next agent reads that same SDK file, and sees the banner immediately:

// Agent reads: tools/todoist/findTasks.ts
// Sees the @CC LEARNED CONSTRAINT banner at the top
// Writes correct code from the start:
const tasks = await tools.todoist.findTasks({ projectId: "12345" });

So no error, no retry, and no wasted inference. The learned constraint prevents the same mistake entirely.

Progress Updates

The model uses progress() to provide real time feedback while a script in being executed. This gives users visibility into what's happening without requiring multiple executeCode() calls like normal tools calls.

The sandbox uses stdout as an IPC channel and not a log stream, so each line is parsed as a JSON and routed based on its type field. A normal console.log("hi") isn't valid protocol JSON, so the sandbox ignores it.

progress(data) wraps your data in the correct format ({ type: "progress", data }) so it gets captured, stored in progressLogs, and forwarded to the onProgress callback.

Example

const users = await tools.users.listAllUsers();
progress({ step: "Loaded users", count: users.length });

const admins = users.filter((u) => u.role === "admin");
progress({ step: "Filtered admins", count: admins.length });

for (let i = 0; i < admins.length; i++) {
  await tools.permissions.revokeAccess({ userId: admins[i].id });
  if ((i + 1) % 10 === 0) {
    progress({ step: "Revoking", processed: i + 1, total: admins.length });
  }
}

progress({ step: "Complete", revoked: admins.length });
return { adminsProcessed: admins.length };

This keeps the UX of a step-by-step agent with user-facing updates while still getting the cost and speed benefits of single-pass execution.

Why TypeScript?

Benchmarks show Claude Opus 4.1 performs:

  • 42.3% on Python
  • 47.7% on TypeScript

That's a 12% improvement just from language choice, and various other models show the same pattern.

TypeScript also gives you:

  • Full type inference for SDK generation
  • Compile time validation of tool schemas
  • The model sees types and can use them correctly

Roadmap

  • MCP Client - connect to MCP servers via stdio/HTTP
  • Tool Registry - unified routing for MCP + internal tools
  • Generate SDK Files - For every tool, generate a well typed SDK file w/ the input & output schemas
  • Deno Sandbox - isolated TypeScript execution with an IPC tool bridge
  • Tools Proxy - intercept tools.namespace.method() calls in the deno sandbox
  • Progress Streaming - the Real time onProgress callback support is working
  • Handling Errors - the entire stack traces + numbered code on failure returns on error
  • Result Validation - Catch undefined values (property access errors) similar to the above

Agent

  • Add internal tools - Expose the readFile and executeCode tools to Agent
  • Normal agent loop - handle LLM messages and tool calls w/ streaming, just a normal agent loop w/ open router
  • System prompt - guide the LLM to explore SDK files, write code and etc
  • [] add warning for destrtive tools in code scripts the user can type y/n if they want to continue

More stuff

  • Side By Side - Using the same set of tools and a task, do a direct comparison w/ code call and a traditional agent
  • Documentation - docs and usage examples
  • NPM Package - an npm package for codecall (down the road)

Contributing

We welcome contributions! Please Feel free to:

  • Open issues for bugs or feature requests
  • Submit PRs for improvements
  • Share your use cases and feedback

Acknowledgements

This project builds on ideas from the community and is directly inspired by:

Videos

Articles

License

MIT

About

An Open Source Typescript implementation of Programmatic Tool Calling for AI Agents (Based on Code Mode))

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published