A small, modular agent framework for building LLM-powered applications in Python.
Inspired by smolagents and Pi — borrowing the minimal-abstraction philosophy from the former and the conversational agent loop from the latter.
Beta — TinyAgent is usable but not production-ready. APIs may change between minor versions.
Note: Reference copy of alchemy-rs available at
/home/tuna/alchemy-rs-ref
TinyAgent provides a lightweight foundation for creating conversational AI agents with tool use capabilities. It features:
- Streaming-first architecture: All LLM interactions support streaming responses
- Tool execution: Define and execute tools with structured outputs
- Event-driven: Subscribe to agent events for real-time UI updates
- Provider agnostic: Works with any OpenAI-compatible
/chat/completionsendpoint (OpenRouter, OpenAI, Chutes, local servers) - Prompt caching: Reduce token costs and latency with Anthropic-style cache breakpoints
- Provider path: Default built-in Rust (
alchemy_provider) with optional proxy integration - Type-safe: Full type hints throughout
import asyncio
from tinyagent import Agent, AgentOptions
from tinyagent.alchemy_provider import OpenAICompatModel, stream_alchemy_openai_completions
# Create an agent
agent = Agent(
AgentOptions(
stream_fn=stream_alchemy_openai_completions,
session_id="my-session"
)
)
# Configure
agent.set_system_prompt("You are a helpful assistant.")
agent.set_model(
OpenAICompatModel(
provider="openrouter",
id="anthropic/claude-3.5-sonnet",
base_url="https://openrouter.ai/api/v1/chat/completions",
)
)
# Optional: any OpenAI-compatible /chat/completions endpoint
# agent.set_model(OpenAICompatModel(provider="openai", id="gpt-4o-mini", base_url="https://api.openai.com/v1/chat/completions"))
# Simple prompt
async def main():
response = await agent.prompt_text("What is the capital of France?")
print(response)
asyncio.run(main())pip install tiny-agent-osThe Agent class is the main entry point. It manages:
- Conversation state (messages, tools, system prompt)
- Streaming responses
- Tool execution
- Event subscription
Messages are Pydantic models (use attribute access):
UserMessage: Input from the userAssistantMessage: Response from the LLMToolResultMessage: Result from tool execution
Tools are functions the LLM can call:
from tinyagent import AgentTool, AgentToolResult, TextContent
async def calculate_sum(tool_call_id: str, args: dict, signal, on_update) -> AgentToolResult:
result = args["a"] + args["b"]
return AgentToolResult(
content=[TextContent(text=str(result))]
)
tool = AgentTool(
name="sum",
description="Add two numbers",
parameters={
"type": "object",
"properties": {
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["a", "b"]
},
execute=calculate_sum
)
agent.set_tools([tool])The agent emits events during execution:
AgentStartEvent/AgentEndEvent: Agent run lifecycleTurnStartEvent/TurnEndEvent: Single turn lifecycleMessageStartEvent/MessageUpdateEvent/MessageEndEvent: Message streamingToolExecutionStartEvent/ToolExecutionUpdateEvent/ToolExecutionEndEvent: Tool execution
Subscribe to events:
def on_event(event):
print(f"Event: {event.type}")
unsubscribe = agent.subscribe(on_event)TinyAgent supports Anthropic-style prompt caching to reduce costs on multi-turn conversations. Enable it when creating the agent:
agent = Agent(
AgentOptions(
stream_fn=stream_alchemy_openai_completions,
session_id="my-session",
enable_prompt_caching=True,
)
)Cache breakpoints are automatically placed on user message content blocks so the prompt prefix stays cached across turns. See Prompt Caching for details.
TinyAgent ships with an optional Rust-based LLM provider implemented in
src/lib.rs. It wraps the alchemy-llm
Rust crate and exposes it to Python via PyO3 as
tinyagent._alchemy, giving you native-speed OpenAI-compatible streaming without
leaving the Python process.
The pure-Python proxy path (proxy.py) remains available, while the Rust path
provides:
- Lower per-token overhead -- SSE parsing, JSON deserialization, and event dispatch all happen in compiled Rust with a multi-threaded Tokio runtime.
- Unified provider abstraction --
alchemy-llmnormalizes differences across providers (OpenRouter, Anthropic, custom endpoints) behind a single streaming interface. - Full event fidelity -- text deltas, thinking deltas, tool call deltas, and terminal events are surfaced as TinyAgent event/message models.
Python (async) Rust (Tokio)
───────────────── ─────────────────────────
stream_alchemy_*() ──> alchemy_llm::stream()
│
AlchemyStreamResponse ├─ SSE parse + deserialize
.__anext__() <── ├─ event_to_py_value()
(asyncio.to_thread) └─ mpsc channel -> Python
- Python calls
openai_completions_stream(model, context, options)which is a#[pyfunction]. - The Rust side builds an
alchemy-llmrequest, opens an SSE stream on a shared Tokio runtime, and sends events through anmpscchannel. - Python reads events by calling the blocking
next_event()method viaasyncio.to_thread, making it async-compatible without busy-waiting. - A terminal
doneorerrorevent signals the end of the stream. The finalAssistantMessagedict is available viaresult().
Requires a Rust toolchain (1.70+) and maturin.
pip install maturin
maturin develop # debug build, installs into current venv
maturin develop --release # optimized buildTwo functions are exposed from the tinyagent._alchemy module:
| Function | Description |
|---|---|
collect_openai_completions(model, context, options?) |
Blocking. Consumes the entire stream and returns {"events": [...], "final_message": {...}}. Useful for one-shot calls. |
openai_completions_stream(model, context, options?) |
Returns an OpenAICompletionsStream handle for incremental consumption. |
The OpenAICompletionsStream handle has two methods:
| Method | Description |
|---|---|
next_event() |
Blocking. Returns the next event dict, or None when the stream ends. |
result() |
Blocking. Returns the final assistant message dict. |
All three arguments are plain Python dicts:
model = {
"id": "anthropic/claude-3.5-sonnet",
"base_url": "https://openrouter.ai/api/v1/chat/completions",
"provider": "openrouter", # required for env-key fallback/inference
"api": "openai-completions", # optional; inferred from provider when omitted/blank
"headers": {"X-Custom": "val"}, # optional
"reasoning": False, # optional
"context_window": 128000, # optional
"max_tokens": 4096, # optional
}
context = {
"system_prompt": "You are helpful.",
"messages": [
{"role": "user", "content": [{"type": "text", "text": "Hello"}]}
],
"tools": [ # optional
{"name": "sum", "description": "Add numbers", "parameters": {...}}
],
}
options = {
"api_key": "sk-...", # optional
"temperature": 0.7, # optional
"max_tokens": 1024, # optional
}Routing contract (provider, api, base_url):
provider: backend identity used for API-key fallback and provider defaultsapi: alchemy unified API selector (openai-completionsorminimax-completions)base_url: concrete HTTP endpoint
If api is omitted/blank, the Python side infers:
provider in {"minimax", "minimax-cn"}=>minimax-completions- otherwise =>
openai-completions
Legacy API aliases are normalized for backward compatibility:
api="openrouter"/api="openai"=>openai-completionsapi="minimax"=>minimax-completions
You don't need to call the Rust binding directly. Use the alchemy_provider module:
from tinyagent import Agent, AgentOptions
from tinyagent.alchemy_provider import OpenAICompatModel, stream_alchemy_openai_completions
agent = Agent(
AgentOptions(
stream_fn=stream_alchemy_openai_completions,
session_id="my-session",
)
)
agent.set_model(
OpenAICompatModel(
provider="openrouter",
id="anthropic/claude-3.5-sonnet",
base_url="https://openrouter.ai/api/v1/chat/completions",
)
)MiniMax global:
agent.set_model(
OpenAICompatModel(
provider="minimax",
id="MiniMax-M2.5",
base_url="https://api.minimax.io/v1/chat/completions",
# api is optional here; inferred as "minimax-completions"
)
)MiniMax CN:
agent.set_model(
OpenAICompatModel(
provider="minimax-cn",
id="MiniMax-M2.5",
base_url="https://api.minimax.chat/v1/chat/completions",
# api is optional here; inferred as "minimax-completions"
)
)Cross-provider tool-call smoke examples:
- One-agent workflow:
examples/example_tool_calls_three_providers.py - Raw Rust binding workflow (multi-turn tools):
scripts/smoke_rust_tool_calls_three_providers.py- Command:
uv run python scripts/smoke_rust_tool_calls_three_providers.py
- Command:
- Rust binding currently dispatches only
openai-completionsandminimax-completions. - Image blocks are not yet supported (text and thinking blocks work).
next_event()is blocking and runs in a thread viaasyncio.to_thread-- this adds slight overhead compared to a native async generator, but keeps the GIL released during the Rust work.
- Architecture: System design and component interactions
- API Reference: Detailed module documentation
- Prompt Caching: Cache breakpoints, cost savings, and provider requirements
- OpenAI-Compatible Endpoints: Using
OpenAICompatModel.base_urlwith OpenRouter/OpenAI/Chutes-compatible backends - Usage Semantics: Canonical
usageschema across provider flows - Changelog: Release history
tinyagent/
├── agent.py # Agent class
├── agent_loop.py # Core agent execution loop
├── agent_tool_execution.py # Tool execution helpers
├── agent_types.py # Type definitions
├── caching.py # Prompt caching utilities
├── alchemy_provider.py # Rust-based provider (PyO3)
├── proxy.py # Proxy server integration
└── proxy_event_handlers.py # Proxy event parsing
