We abstract context engineering — and hand back the trace.
Live to develop · offline to monitor · detailed to improve.
When you build an Agentic Application, you collect domain-specific data and instructions, then wire them up based on what your system receives.
That data and those instructions wear many names — Skills · Steering · Guardrails · RAG · Tool APIs · Memory — with more on the way. But they all do the same thing: they inject into one of three slots in the LLM call (system, messages, tools).
So we abstracted the injection itself.
The abstraction is three rules:
- Three slots are fixed.
system,messages,tools— the LLM API surface. - N flavors are open. You declare what you have. Tomorrow's flavor (few-shot, reflection, persona, A2A handoff…) plugs in the same way.
- Rules decide where and when. You provide the rules. We collect your data, fire the right one, land it in the right slot at the right iteration.
That's the whole model: Injection = slot × trigger × cache.
- Slot — which of the 3 LLM API regions the content lands in (
system/messages/tools). - Trigger — when the content fires (see below).
- Cache — how stable the content is across iterations. The framework places provider cache markers for you — stable content gets 80–90% cheaper prefixes.
| Trigger | Flavor | Fires when | Builder example | Default slot |
|---|---|---|---|---|
always |
static | Every iteration | .steering('You are a triage agent…') |
system |
rule |
runtime — predicate | Your rule returns true | .rag({ when: s => /price|refund/.test(s.userQuery) }) |
messages |
on-tool-return |
runtime — lifecycle | After a specific tool returns | .instruction({ after: 'search_db', text: 'Cite source IDs.' }) |
messages |
llm-activated |
runtime — agent-driven | LLM calls read_skill('id') |
.skill({ id: 'refund-policy', activatedBy: 'read_skill' }) |
messages (body) |
Note
Slot is a default, not a coupling — the same Skill can live in tools (schema only, discovered via read_skill), messages (body injected on activation), or system (baked into the prompt as steering).
3 slots × 4 triggers × N flavors = the entire context-engineering surface.
The agent space has many credible primary abstractions:
| Framework | What it abstracts |
|---|---|
| LangChain | Pipelines of composable components |
| LangGraph | State machines of nodes and edges |
| CrewAI · AutoGen | Crews of role-playing agents |
| Mastra · Genkit · Pydantic AI | Typed full-stack bundles |
| DSPy | Compiled prompts |
| Inngest AgentKit | Durable workflows |
We didn't have to choose between them.
agentfootprint is built on footprintjs — the flowchart pattern for backend code. footprintjs gives us every one of those abstractions out of the box:
| Capability | What footprintjs hands us |
|---|---|
| Composition | Sequence · Parallel · Conditional · Loop |
| State machines | The ReAct loop is a flowchart |
| Multi-agent crews | Compose Agents through control flow — no special class needed |
| Durable workflows | pauseHere() plus JSON-portable resume() |
| Typed observation | 60+ events for free, because the framework owns the loop |
So we used the budget those abstractions would have cost us to invest deeply in something they all leave to the developer: the injection loop.
Important
We abstract context engineering — and hand back the trace. Live to develop · offline to monitor · detailed to improve.
For fifty years, software bugs have been logic errors. A wrong condition, a missed edge case, an off-by-one. You step through the code until you find the bad branch.
LLM-powered apps add a second class of bug: contextual errors. The code is correct. The model is correct. The answer is wrong because the LLM's decision rests on context that was ambiguous, confusing, or misleading at the moment of inference.
Tracking which content the model actually saw, and why, is the entire debugging job. Without it, the failure mode is invisible:
| What got injected wrong | What the model did |
|---|---|
Wrong instruction landed in the system slot |
Followed the wrong rule |
| Predicate fired one iteration too early | Reasoned with stale assumptions |
Skill body missing when the LLM called read_skill |
Invented its own |
| Cache prefix invalidated mid-iteration | Saw a silently rewritten stale version |
Tool returned but the on-tool-return injection didn't fire |
Couldn't interpret the result |
Important
The model doesn't tell you which of these went wrong. It just gives you the wrong answer.
You can't step through that with a debugger. By the time you read the response, the context that produced it is gone unless something recorded it.
That's the gap agentfootprint fills. A framework that owns the control flow can debug logic errors. A framework that owns the injection can debug contextual errors — because every injection is a typed event with a where, when, why, and how-it-cached.
Because we own the injection, every LLM call backtracks to four typed answers:
- What was injected
- Who triggered it (which rule)
- When it fired
- How it landed — slot, position, cache
Same trace, three workflows:
- Live — debug as you build. See exactly which injection produced which token, which predicate fired this iteration, which prefix actually got cached.
- Offline — monitor what shipped. Replay any past run from its trace. Alert on drift. Attribute cost per injection.
- Detailed — improve via export. Every successful trajectory is labeled training data for SFT, DPO, or RL — no separate data-collection phase.
And a fourth, novel: the agent can read its own trace. Six months after the agent rejected loan #42, "why did you reject it?" answers from the recorded evidence (creditScore=580, threshold=600), not a rerun. Causal memory turns the trace into the agent's working memory.
Two scales — same alphabet. Four control flows are the entire vocabulary.
|
|
import { Sequence } from 'agentfootprint';
const flow = Sequence.create()
.step('a', stageA)
.step('b', stageB)
.step('c', stageC)
.build(); |
|
|
import { Parallel } from 'agentfootprint';
const fan = Parallel.create()
.branch('web', searchWeb)
.branch('docs', searchDocs)
.mergeWithFn(synthesizer)
.build(); |
|
|
import { Conditional } from 'agentfootprint';
const router = Conditional.create()
.when('billing', s => s.intent === 'billing', billingAgent)
.when('tech', s => s.intent === 'tech', techAgent)
.otherwise('default', defaultAgent)
.build(); |
|
|
import { Loop } from 'agentfootprint';
const reflexion = Loop.create()
.repeat(thinkAgent)
.until(s => s.satisfied)
.build(); |
Same five stages on both sides. Only one thing differs — where the loop returns. Classic ReAct loops back to CallLLM and slots stay frozen. Dynamic ReAct (agentfootprint) loops back to SystemPrompt, so injections that fired on the previous tool result recompose the next prompt. Per-iteration recomposition is also the structural prerequisite for the cache layer.
| Iteration | Classic ReAct | Dynamic ReAct (agentfootprint) |
|---|---|---|
| 1 | 12 tools shown | 1 tool (read_skill) |
| 2 | 12 tools shown | 5 tools (skill activated) |
| 3 | 12 tools shown | 5 tools |
Pick the flows that match your problem. Chain them. That's your Agentic Application.
const research = Loop.create()
.repeat(Sequence.create().step('plan', plan).step('search', searchAll).build())
.until(s => s.satisfied).build();Same .create().method().build() shape as the four rows above — just composed.
The patterns the field knows reduce to the same alphabet:
| Pattern | Composition |
|---|---|
| Swarm | Loop( Parallel( Agent×N ) → merge ) |
| Tree-of-Thoughts | Loop( Parallel( Agent×N ) → Conditional(score) ) |
| Reflexion | Loop( Agent → Conditional(critique) → Agent ) |
| Debate | Parallel( Agent_pro, Agent_con ) → Agent_judge |
| Router | Conditional → Agent_A | Agent_B | Agent_C |
| Hierarchical | Agent_planner → Sequence( Agent_worker×N ) → synth |
Same trick as Beat 1: instead of N libraries for N patterns, we found the M building blocks all N patterns are made of.
📖 Compare: hand-rolled vs declarative · migration from LangChain / CrewAI / LangGraph
Because we own the loop (Beat 2), every decision and execution is captured during traversal — not bolted on. The default capture is the causal trace: every stage, read, write, and decision evidence, as a JSON-portable, scrubbable, queryable, exportable artifact. Beyond the default, wire custom recorders for cost, latency, or quality scoring — any observation hook fires on the same stream.
The same trace serves three downstream consumers — no extra instrumentation:
-
Audit / compliance. Six months later, "why was loan #42 rejected?" answers from the chain (
creditScore=580 < 620 ∧ dti=0.6 > 0.43 → riskTier=high → REJECTED). No LLM call. GDPR Art. 22, ECOA, and EU AI Act adverse-action notices write themselves from the captured decision evidence. -
Cheap-model triage. A Sonnet trace becomes good input for Haiku to answer follow-ups. ~200 tokens at any model ($0.25/1M) vs ~2,500 tokens at a reasoning model ($15/1M). Memoization for agent thinking — no agent rerun.
-
Training data — the substrate is already there. Every successful chain is a labeled trajectory. SFT pairs (
{prompt, completion}) fall out of the snapshot's history field; the export wrapper is roadmap work tracked in GitHub issues. DPO and process-RL need additional collection layers (preference feedback, per-step reward annotation) that don't ship today.
Two built-in lenses view the same trace:
| Lens | View | When to use |
|---|---|---|
| Lens | Agent-centric — User/Agent[3 slots]/Tool flowchart with iteration scrubber and round commentary | Live debugging, "what did Neo see at step 5?" |
| Explainable Trace | Structural — subflow tree, full flowchart, memory inspector, per-stage execution timeline | Architecture review, root-cause analysis |
📖 Powered by footprintjs
causalChain()— backward thin-slicing on the commit log. Causal memory deep dive · Explainability & compliance
One recording. Two lenses. Three consumers. Zero extra instrumentation.
npm install agentfootprint footprintjsimport { Agent, defineTool, mock } from 'agentfootprint';
const weather = defineTool({
name: 'weather',
description: 'Get current weather for a city.',
inputSchema: {
type: 'object',
properties: { city: { type: 'string' } },
required: ['city'],
},
execute: async ({ city }: { city: string }) => `${city}: 72°F, sunny`,
});
const agent = Agent.create({
provider: mock({ reply: 'I checked: it is 72°F and sunny.' }),
model: 'mock',
})
.system('You answer weather questions using the weather tool.')
.tool(weather)
.build();
const result = await agent.run({ message: 'Weather in Paris?' });
console.log(result); // → "I checked: it is 72°F and sunny."Swap mock(...) for anthropic(...) / openai(...) / bedrock(...) / ollama(...) for production. Nothing else changes.
Build the entire app against in-memory mocks with zero API cost, then swap real infrastructure one boundary at a time.
| Boundary | Dev | Prod |
|---|---|---|
| LLM provider | mock(...) |
anthropic() · openai() · bedrock() · ollama() |
| Memory store | InMemoryStore |
RedisStore · AgentCoreStore |
| MCP | mockMcpClient(...) |
mcpClient({ transport }) |
| Cache strategy | NoOpCacheStrategy |
auto-selected per provider |
The flowchart, recorders, and tests don't change between dev and prod.
Core
- 2 primitives —
LLMCall,Agent(the ReAct loop) - 4 control flows —
Sequence,Parallel,Conditional,Loop - 1 Injection primitive —
defineSkill/defineSteering/defineInstruction/defineFact - 1 reliability gate —
.reliability({ preCheck, postDecide, providers, circuitBreaker, fallback }) - 1 tool dispatch primitive —
ToolProvider(sync OR async) —staticTools·gatedTools·skillScopedTools· customdiscoveryProviderover hubs / MCP / per-tenant catalogs
LLM providers (7)
| Factory | Use for |
|---|---|
anthropic |
Claude (Sonnet, Opus, Haiku) via @anthropic-ai/sdk |
openai |
GPT-4o, GPT-4-turbo via openai SDK |
bedrock |
Claude / Titan / Mistral via AWS Bedrock runtime |
ollama |
Local models (OpenAI-compatible endpoint) |
browserAnthropic |
Browser-side Claude calls (no proxy server) |
browserOpenai |
Browser-side OpenAI calls (no proxy server) |
mock |
Deterministic dev/test (zero API cost) |
Memory + adapters
- Memory factory — 4 types (
episodic/semantic/narrative/causal) × 7 strategies (window/budget/summarize/topK/extract/decay/hybrid) - Memory stores —
InMemoryStore,RedisStore(peer-depioredis),AgentCoreStore(peer-dep AWS SDK) - RAG · MCP adapters —
mockMcpClient(...)/mcpClient({ transport })
Operability
- Provider-agnostic prompt caching — declarative per-injection, per-iteration marker recomputation
- Pause / resume — JSON-serializable checkpoints; resume hours later on a different server
- Resilience primitives —
withRetry,withFallback,withCircuitBreaker,.outputFallback,agent.resumeOnError - 60+ typed observability events —
agent·composition·context·stream·tools·skill·memory·cache·cost·permission·eval·embedding·pause·error·fallback·resilience·reliability·risk
Tooling
- Lens · Explainable Trace — two visual replays of the causal trace (separate
agentfootprint-lenspackage) - AI-coding-tool support — Claude Code · Cursor · Windsurf · Cline · Kiro · Copilot
| If you are... | Go here |
|---|---|
| New to agents | 5-minute quick start |
| Coming from LangChain / CrewAI / LangGraph | Migration guide |
| Architecting an enterprise rollout | Production guide |
| Doing due diligence | Architecture overview |
| Researcher / academic background | Citations & prior art |
| Curious about design | Inspiration docs |
Or jump into the examples gallery — every example is also an end-to-end CI test.
footprintjs — the flowchart pattern for backend code. agentfootprint's decision-evidence capture, narrative recording, and time-travel checkpointing are footprintjs primitives at the runtime layer.
You don't need to learn footprintjs to use agentfootprint — but if you want to build your own primitives at this depth, start there.