Skip to content

footprintjs/agentfootprint

agentfootprint mascot composing context flavors (Skills, Steering, Guardrails, RAG, Tool APIs, Memory) into three structured LLM slots (system, messages, tools) — the central abstraction, visualized.

Agentfootprint

We abstract context engineering — and hand back the trace.
Live to develop · offline to monitor · detailed to improve.

CI Coverage npm version Downloads MIT


1. What we abstract

When you build an Agentic Application, you collect domain-specific data and instructions, then wire them up based on what your system receives.

That data and those instructions wear many names — Skills · Steering · Guardrails · RAG · Tool APIs · Memory — with more on the way. But they all do the same thing: they inject into one of three slots in the LLM call (system, messages, tools).

So we abstracted the injection itself.

agentfootprint — Every LLM call has 3 fixed slots (system, messages, tools). Every flavor lands in one slot under one of 4 fixed triggers (always · rule · on-tool-return · llm-activated). Sparkle streams flow from each trigger lane down to a specific pill inside its destination slot — same slot can hold pills from different triggers (RAG via rule, Instruction via on-tool-return), and the same flavor (Skill) can land in different slots.

The abstraction is three rules:

  1. Three slots are fixed. system, messages, tools — the LLM API surface.
  2. N flavors are open. You declare what you have. Tomorrow's flavor (few-shot, reflection, persona, A2A handoff…) plugs in the same way.
  3. Rules decide where and when. You provide the rules. We collect your data, fire the right one, land it in the right slot at the right iteration.

That's the whole model: Injection = slot × trigger × cache.

  • Slot — which of the 3 LLM API regions the content lands in (system / messages / tools).
  • Trigger — when the content fires (see below).
  • Cache — how stable the content is across iterations. The framework places provider cache markers for you — stable content gets 80–90% cheaper prefixes.

The 4 triggers

Trigger Flavor Fires when Builder example Default slot
always static Every iteration .steering('You are a triage agent…') system
rule runtime — predicate Your rule returns true .rag({ when: s => /price|refund/.test(s.userQuery) }) messages
on-tool-return runtime — lifecycle After a specific tool returns .instruction({ after: 'search_db', text: 'Cite source IDs.' }) messages
llm-activated runtime — agent-driven LLM calls read_skill('id') .skill({ id: 'refund-policy', activatedBy: 'read_skill' }) messages (body)

Note

Slot is a default, not a coupling — the same Skill can live in tools (schema only, discovered via read_skill), messages (body injected on activation), or system (baked into the prompt as steering).

3 slots × 4 triggers × N flavors = the entire context-engineering surface.


2. Why we chose this abstraction

The agent space has many credible primary abstractions:

Framework What it abstracts
LangChain Pipelines of composable components
LangGraph State machines of nodes and edges
CrewAI · AutoGen Crews of role-playing agents
Mastra · Genkit · Pydantic AI Typed full-stack bundles
DSPy Compiled prompts
Inngest AgentKit Durable workflows

We didn't have to choose between them.

agentfootprint is built on footprintjs — the flowchart pattern for backend code. footprintjs gives us every one of those abstractions out of the box:

Capability What footprintjs hands us
Composition Sequence · Parallel · Conditional · Loop
State machines The ReAct loop is a flowchart
Multi-agent crews Compose Agents through control flow — no special class needed
Durable workflows pauseHere() plus JSON-portable resume()
Typed observation 60+ events for free, because the framework owns the loop

So we used the budget those abstractions would have cost us to invest deeply in something they all leave to the developer: the injection loop.

Important

We abstract context engineering — and hand back the trace. Live to develop · offline to monitor · detailed to improve.

The reason — agents have a new class of bug

For fifty years, software bugs have been logic errors. A wrong condition, a missed edge case, an off-by-one. You step through the code until you find the bad branch.

LLM-powered apps add a second class of bug: contextual errors. The code is correct. The model is correct. The answer is wrong because the LLM's decision rests on context that was ambiguous, confusing, or misleading at the moment of inference.

Tracking which content the model actually saw, and why, is the entire debugging job. Without it, the failure mode is invisible:

What got injected wrong What the model did
Wrong instruction landed in the system slot Followed the wrong rule
Predicate fired one iteration too early Reasoned with stale assumptions
Skill body missing when the LLM called read_skill Invented its own
Cache prefix invalidated mid-iteration Saw a silently rewritten stale version
Tool returned but the on-tool-return injection didn't fire Couldn't interpret the result

Important

The model doesn't tell you which of these went wrong. It just gives you the wrong answer.

You can't step through that with a debugger. By the time you read the response, the context that produced it is gone unless something recorded it.

That's the gap agentfootprint fills. A framework that owns the control flow can debug logic errors. A framework that owns the injection can debug contextual errors — because every injection is a typed event with a where, when, why, and how-it-cached.

What that buys you

Because we own the injection, every LLM call backtracks to four typed answers:

  • What was injected
  • Who triggered it (which rule)
  • When it fired
  • How it landed — slot, position, cache

Same trace, three workflows:

  • Live — debug as you build. See exactly which injection produced which token, which predicate fired this iteration, which prefix actually got cached.
  • Offline — monitor what shipped. Replay any past run from its trace. Alert on drift. Attribute cost per injection.
  • Detailed — improve via export. Every successful trajectory is labeled training data for SFT, DPO, or RL — no separate data-collection phase.

And a fourth, novel: the agent can read its own trace. Six months after the agent rejected loan #42, "why did you reject it?" answers from the recorded evidence (creditScore=580, threshold=600), not a rerun. Causal memory turns the trace into the agent's working memory.


3. How do I design my agent or system of agents?

Two scales — same alphabet. Four control flows are the entire vocabulary.

Sequence — linear chain A → B → C.
import { Sequence } from 'agentfootprint';

const flow = Sequence.create()
  .step('a', stageA)
  .step('b', stageB)
  .step('c', stageC)
  .build();
Parallel — fan-out then fan-in across N agents.
import { Parallel } from 'agentfootprint';

const fan = Parallel.create()
  .branch('web', searchWeb)
  .branch('docs', searchDocs)
  .mergeWithFn(synthesizer)
  .build();
Conditional — diamond gate routes to one of N branches based on a predicate.
import { Conditional } from 'agentfootprint';

const router = Conditional.create()
  .when('billing', s => s.intent === 'billing', billingAgent)
  .when('tech',    s => s.intent === 'tech',    techAgent)
  .otherwise('default', defaultAgent)
  .build();
Loop — body cycles back from end to start until a condition is met.
import { Loop } from 'agentfootprint';

const reflexion = Loop.create()
  .repeat(thinkAgent)
  .until(s => s.satisfied)
  .build();

Inside one agent — Dynamic vs Classic ReAct

Classic ReAct vs Dynamic ReAct loop topology — same 5 stages (SystemPrompt, Messages, Tools, CallLLM, Route → ExecuteTools/Finalize), but the loop edge differs: Classic returns to CallLLM only (slots frozen at 12 tools every iteration), Dynamic returns to SystemPrompt (slots recompose, tools shrink from 1 to 5 as skills activate).

Same five stages on both sides. Only one thing differs — where the loop returns. Classic ReAct loops back to CallLLM and slots stay frozen. Dynamic ReAct (agentfootprint) loops back to SystemPrompt, so injections that fired on the previous tool result recompose the next prompt. Per-iteration recomposition is also the structural prerequisite for the cache layer.

Iteration Classic ReAct Dynamic ReAct (agentfootprint)
1 12 tools shown 1 tool (read_skill)
2 12 tools shown 5 tools (skill activated)
3 12 tools shown 5 tools

📖 Dynamic ReAct guide · Key concepts

Multi-agent — compose with the alphabet

A custom research agent built from the same 4 control flows: input flows into a Conditional gate (plan more research?), which fans out to a Parallel block (search_web, search_docs, search_kb), then chains into a Sequence (synthesize → critique), and a Loop arrow returns from the end back to the Conditional gate so the agent iterates until satisfied. Formula: Loop( Conditional(plan?) → Parallel(search_web, search_docs, search_kb) → Sequence(synth → critique) ).

Pick the flows that match your problem. Chain them. That's your Agentic Application.

const research = Loop.create()
  .repeat(Sequence.create().step('plan', plan).step('search', searchAll).build())
  .until(s => s.satisfied).build();

Same .create().method().build() shape as the four rows above — just composed.

Named patterns — also compositions of the same 4

6 named multi-agent patterns reduce to compositions of the same 4 control flows: Swarm = Loop(Parallel(Agent×N) → merge); Tree-of-Thoughts = Loop(Parallel(Agent×N) → Conditional(score)); Reflexion = Loop(Agent → Conditional(critique) → Agent); Debate = Parallel(Agent_pro, Agent_con) → Agent_judge; Router = Conditional → Agent_A | Agent_B | Agent_C; Hierarchical = Agent_planner → Sequence(Agent_worker×N) → synth.

The patterns the field knows reduce to the same alphabet:

Pattern Composition
Swarm Loop( Parallel( Agent×N ) → merge )
Tree-of-Thoughts Loop( Parallel( Agent×N ) → Conditional(score) )
Reflexion Loop( Agent → Conditional(critique) → Agent )
Debate Parallel( Agent_pro, Agent_con ) → Agent_judge
Router Conditional → Agent_A | Agent_B | Agent_C
Hierarchical Agent_planner → Sequence( Agent_worker×N ) → synth

Same trick as Beat 1: instead of N libraries for N patterns, we found the M building blocks all N patterns are made of.

📖 Compare: hand-rolled vs declarative · migration from LangChain / CrewAI / LangGraph


4. How do I see what my agent did?

Because we own the loop (Beat 2), every decision and execution is captured during traversal — not bolted on. The default capture is the causal trace: every stage, read, write, and decision evidence, as a JSON-portable, scrubbable, queryable, exportable artifact. Beyond the default, wire custom recorders for cost, latency, or quality scoring — any observation hook fires on the same stream.

agentfootprint causal memory — Each agent run produces a JSON-portable causal trace: a scrubbable timeline of every stage with reads, writes, and captured decision evidence. The trace card shows a time-travel slider (Step 5 of 17, Live), an execution timeline with stage-duration bars, and the captured decision evidence pill (riskTier eq high → reject). Two built-in lenses view it: Lens (agent-centric) and Explainable Trace (structural). Three programmatic consumers fan out from it: audit replay (GDPR Article 22 adverse-action notice answered from chain, no LLM call, $15/1M to $0.25/1M tokens), cheap-model triage (Sonnet trace fed to Haiku for follow-ups), and training data export (every chain is a labeled trajectory ready for SFT/DPO/process-RL). One recording, two lenses, three consumers, zero extra instrumentation. Powered by footprintjs causalChain().

The same trace serves three downstream consumers — no extra instrumentation:

  1. Audit / compliance. Six months later, "why was loan #42 rejected?" answers from the chain (creditScore=580 < 620 ∧ dti=0.6 > 0.43 → riskTier=high → REJECTED). No LLM call. GDPR Art. 22, ECOA, and EU AI Act adverse-action notices write themselves from the captured decision evidence.

  2. Cheap-model triage. A Sonnet trace becomes good input for Haiku to answer follow-ups. ~200 tokens at any model ($0.25/1M) vs ~2,500 tokens at a reasoning model ($15/1M). Memoization for agent thinking — no agent rerun.

  3. Training data — the substrate is already there. Every successful chain is a labeled trajectory. SFT pairs ({prompt, completion}) fall out of the snapshot's history field; the export wrapper is roadmap work tracked in GitHub issues. DPO and process-RL need additional collection layers (preference feedback, per-step reward annotation) that don't ship today.

Two built-in lenses view the same trace:

Lens View When to use
Lens Agent-centric — User/Agent[3 slots]/Tool flowchart with iteration scrubber and round commentary Live debugging, "what did Neo see at step 5?"
Explainable Trace Structural — subflow tree, full flowchart, memory inspector, per-stage execution timeline Architecture review, root-cause analysis

📖 Powered by footprintjs causalChain() — backward thin-slicing on the commit log. Causal memory deep dive · Explainability & compliance

One recording. Two lenses. Three consumers. Zero extra instrumentation.


Quick start — runs offline, no API key

npm install agentfootprint footprintjs
import { Agent, defineTool, mock } from 'agentfootprint';

const weather = defineTool({
  name: 'weather',
  description: 'Get current weather for a city.',
  inputSchema: {
    type: 'object',
    properties: { city: { type: 'string' } },
    required: ['city'],
  },
  execute: async ({ city }: { city: string }) => `${city}: 72°F, sunny`,
});

const agent = Agent.create({
  provider: mock({ reply: 'I checked: it is 72°F and sunny.' }),
  model: 'mock',
})
  .system('You answer weather questions using the weather tool.')
  .tool(weather)
  .build();

const result = await agent.run({ message: 'Weather in Paris?' });
console.log(result);  // → "I checked: it is 72°F and sunny."

Swap mock(...) for anthropic(...) / openai(...) / bedrock(...) / ollama(...) for production. Nothing else changes.


Mocks first, production second

Build the entire app against in-memory mocks with zero API cost, then swap real infrastructure one boundary at a time.

Boundary Dev Prod
LLM provider mock(...) anthropic() · openai() · bedrock() · ollama()
Memory store InMemoryStore RedisStore · AgentCoreStore
MCP mockMcpClient(...) mcpClient({ transport })
Cache strategy NoOpCacheStrategy auto-selected per provider

The flowchart, recorders, and tests don't change between dev and prod.


What ships today

Core

  • 2 primitives — LLMCall, Agent (the ReAct loop)
  • 4 control flows — Sequence, Parallel, Conditional, Loop
  • 1 Injection primitive — defineSkill / defineSteering / defineInstruction / defineFact
  • 1 reliability gate — .reliability({ preCheck, postDecide, providers, circuitBreaker, fallback })
  • 1 tool dispatch primitive — ToolProvider (sync OR async) — staticTools · gatedTools · skillScopedTools · custom discoveryProvider over hubs / MCP / per-tenant catalogs

LLM providers (7)

Factory Use for
anthropic Claude (Sonnet, Opus, Haiku) via @anthropic-ai/sdk
openai GPT-4o, GPT-4-turbo via openai SDK
bedrock Claude / Titan / Mistral via AWS Bedrock runtime
ollama Local models (OpenAI-compatible endpoint)
browserAnthropic Browser-side Claude calls (no proxy server)
browserOpenai Browser-side OpenAI calls (no proxy server)
mock Deterministic dev/test (zero API cost)

Memory + adapters

  • Memory factory — 4 types (episodic / semantic / narrative / causal) × 7 strategies (window / budget / summarize / topK / extract / decay / hybrid)
  • Memory stores — InMemoryStore, RedisStore (peer-dep ioredis), AgentCoreStore (peer-dep AWS SDK)
  • RAG · MCP adapters — mockMcpClient(...) / mcpClient({ transport })

Operability

  • Provider-agnostic prompt caching — declarative per-injection, per-iteration marker recomputation
  • Pause / resume — JSON-serializable checkpoints; resume hours later on a different server
  • Resilience primitives — withRetry, withFallback, withCircuitBreaker, .outputFallback, agent.resumeOnError
  • 60+ typed observability events — agent · composition · context · stream · tools · skill · memory · cache · cost · permission · eval · embedding · pause · error · fallback · resilience · reliability · risk

Tooling

  • Lens · Explainable Trace — two visual replays of the causal trace (separate agentfootprint-lens package)
  • AI-coding-tool support — Claude Code · Cursor · Windsurf · Cline · Kiro · Copilot

📖 Agent API reference · CHANGELOG


Where to next

If you are... Go here
New to agents 5-minute quick start
Coming from LangChain / CrewAI / LangGraph Migration guide
Architecting an enterprise rollout Production guide
Doing due diligence Architecture overview
Researcher / academic background Citations & prior art
Curious about design Inspiration docs

Or jump into the examples gallery — every example is also an end-to-end CI test.


Built on

footprintjs — the flowchart pattern for backend code. agentfootprint's decision-evidence capture, narrative recording, and time-travel checkpointing are footprintjs primitives at the runtime layer.

You don't need to learn footprintjs to use agentfootprint — but if you want to build your own primitives at this depth, start there.


License

MIT © Sanjay Krishna Anbalagan

About

Context engineering, abstracted. Build AI agents whose every LLM call traces back to what was injected, who triggered it, when, and how it cached. Built on footprintjs

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages