TinyAgent

A small, modular agent framework for building LLM-powered applications in Python.

Inspired by smolagents and Pi — borrowing the minimal-abstraction philosophy from the former and the conversational agent loop from the latter.

Beta — TinyAgent is usable but not production-ready. APIs may change between minor versions.

Note: Reference copy of alchemy-rs available at /home/tuna/alchemy-rs-ref

Overview

TinyAgent provides a lightweight foundation for creating conversational AI agents with tool use capabilities. It features:

Streaming-first architecture: All LLM interactions support streaming responses
Tool execution: Define and execute tools with structured outputs
Event-driven: Subscribe to agent events for real-time UI updates
Provider agnostic: Works with any OpenAI-compatible /chat/completions endpoint (OpenRouter, OpenAI, Chutes, local servers)
Prompt caching: Reduce token costs and latency with Anthropic-style cache breakpoints
Provider path: Default built-in Rust (alchemy_provider) with optional proxy integration
Type-safe: Full type hints throughout

Quick Start

import asyncio
from tinyagent import Agent, AgentOptions
from tinyagent.alchemy_provider import OpenAICompatModel, stream_alchemy_openai_completions

# Create an agent
agent = Agent(
    AgentOptions(
        stream_fn=stream_alchemy_openai_completions,
        session_id="my-session"
    )
)

# Configure
agent.set_system_prompt("You are a helpful assistant.")
agent.set_model(
    OpenAICompatModel(
        provider="openrouter",
        id="anthropic/claude-3.5-sonnet",
        base_url="https://openrouter.ai/api/v1/chat/completions",
    )
)
# Optional: any OpenAI-compatible /chat/completions endpoint
# agent.set_model(OpenAICompatModel(provider="openai", id="gpt-4o-mini", base_url="https://api.openai.com/v1/chat/completions"))

# Simple prompt
async def main():
    response = await agent.prompt_text("What is the capital of France?")
    print(response)

asyncio.run(main())

Installation

pip install tiny-agent-os

Core Concepts

Agent

The Agent class is the main entry point. It manages:

Conversation state (messages, tools, system prompt)
Streaming responses
Tool execution
Event subscription

Messages

Messages are Pydantic models (use attribute access):

UserMessage: Input from the user
AssistantMessage: Response from the LLM
ToolResultMessage: Result from tool execution

Tools

Tools are functions the LLM can call:

from tinyagent import AgentTool, AgentToolResult, TextContent

async def calculate_sum(tool_call_id: str, args: dict, signal, on_update) -> AgentToolResult:
    result = args["a"] + args["b"]
    return AgentToolResult(
        content=[TextContent(text=str(result))]
    )

tool = AgentTool(
    name="sum",
    description="Add two numbers",
    parameters={
        "type": "object",
        "properties": {
            "a": {"type": "number"},
            "b": {"type": "number"}
        },
        "required": ["a", "b"]
    },
    execute=calculate_sum
)

agent.set_tools([tool])

Events

The agent emits events during execution:

AgentStartEvent / AgentEndEvent: Agent run lifecycle
TurnStartEvent / TurnEndEvent: Single turn lifecycle
MessageStartEvent / MessageUpdateEvent / MessageEndEvent: Message streaming
ToolExecutionStartEvent / ToolExecutionUpdateEvent / ToolExecutionEndEvent: Tool execution

Subscribe to events:

def on_event(event):
    print(f"Event: {event.type}")

unsubscribe = agent.subscribe(on_event)

Prompt Caching

TinyAgent supports Anthropic-style prompt caching to reduce costs on multi-turn conversations. Enable it when creating the agent:

agent = Agent(
    AgentOptions(
        stream_fn=stream_alchemy_openai_completions,
        session_id="my-session",
        enable_prompt_caching=True,
    )
)

Cache breakpoints are automatically placed on user message content blocks so the prompt prefix stays cached across turns. See Prompt Caching for details.

Rust Binding: `tinyagent._alchemy`

TinyAgent ships with an optional Rust-based LLM provider implemented in src/lib.rs. It wraps the alchemy-llm Rust crate and exposes it to Python via PyO3 as tinyagent._alchemy, giving you native-speed OpenAI-compatible streaming without leaving the Python process.

Why

The pure-Python proxy path (proxy.py) remains available, while the Rust path provides:

Lower per-token overhead -- SSE parsing, JSON deserialization, and event dispatch all happen in compiled Rust with a multi-threaded Tokio runtime.
Unified provider abstraction -- alchemy-llm normalizes differences across providers (OpenRouter, Anthropic, custom endpoints) behind a single streaming interface.
Full event fidelity -- text deltas, thinking deltas, tool call deltas, and terminal events are surfaced as TinyAgent event/message models.

How it works

Python (async)             Rust (Tokio)
─────────────────          ─────────────────────────
stream_alchemy_*()  ──>    alchemy_llm::stream()
                            │
AlchemyStreamResponse       ├─ SSE parse + deserialize
  .__anext__()       <──    ├─ event_to_py_value()
  (asyncio.to_thread)       └─ mpsc channel -> Python

Python calls openai_completions_stream(model, context, options) which is a #[pyfunction].
The Rust side builds an alchemy-llm request, opens an SSE stream on a shared Tokio runtime, and sends events through an mpsc channel.
Python reads events by calling the blocking next_event() method via asyncio.to_thread, making it async-compatible without busy-waiting.
A terminal done or error event signals the end of the stream. The final AssistantMessage dict is available via result().

Building

Requires a Rust toolchain (1.70+) and maturin.

pip install maturin
maturin develop            # debug build, installs into current venv
maturin develop --release  # optimized build

Python API

Two functions are exposed from the tinyagent._alchemy module:

Function	Description
`collect_openai_completions(model, context, options?)`	Blocking. Consumes the entire stream and returns `{"events": [...], "final_message": {...}}`. Useful for one-shot calls.
`openai_completions_stream(model, context, options?)`	Returns an `OpenAICompletionsStream` handle for incremental consumption.

The OpenAICompletionsStream handle has two methods:

Method	Description
`next_event()`	Blocking. Returns the next event dict, or `None` when the stream ends.
`result()`	Blocking. Returns the final assistant message dict.

All three arguments are plain Python dicts:

model = {
    "id": "anthropic/claude-3.5-sonnet",
    "base_url": "https://openrouter.ai/api/v1/chat/completions",
    "provider": "openrouter",          # required for env-key fallback/inference
    "api": "openai-completions",       # optional; inferred from provider when omitted/blank
    "headers": {"X-Custom": "val"},   # optional
    "reasoning": False,                  # optional
    "context_window": 128000,            # optional
    "max_tokens": 4096,                  # optional
}

context = {
    "system_prompt": "You are helpful.",
    "messages": [
        {"role": "user", "content": [{"type": "text", "text": "Hello"}]}
    ],
    "tools": [                       # optional
        {"name": "sum", "description": "Add numbers", "parameters": {...}}
    ],
}

options = {
    "api_key": "sk-...",            # optional
    "temperature": 0.7,              # optional
    "max_tokens": 1024,              # optional
}

Routing contract (provider, api, base_url):

provider: backend identity used for API-key fallback and provider defaults
api: alchemy unified API selector (openai-completions or minimax-completions)
base_url: concrete HTTP endpoint

If api is omitted/blank, the Python side infers:

provider in {"minimax", "minimax-cn"} => minimax-completions
otherwise => openai-completions

Legacy API aliases are normalized for backward compatibility:

api="openrouter" / api="openai" => openai-completions
api="minimax" => minimax-completions

Using via TinyAgent (high-level)

You don't need to call the Rust binding directly. Use the alchemy_provider module:

from tinyagent import Agent, AgentOptions
from tinyagent.alchemy_provider import OpenAICompatModel, stream_alchemy_openai_completions

agent = Agent(
    AgentOptions(
        stream_fn=stream_alchemy_openai_completions,
        session_id="my-session",
    )
)
agent.set_model(
    OpenAICompatModel(
        provider="openrouter",
        id="anthropic/claude-3.5-sonnet",
        base_url="https://openrouter.ai/api/v1/chat/completions",
    )
)

MiniMax global:

agent.set_model(
    OpenAICompatModel(
        provider="minimax",
        id="MiniMax-M2.5",
        base_url="https://api.minimax.io/v1/chat/completions",
        # api is optional here; inferred as "minimax-completions"
    )
)

MiniMax CN:

agent.set_model(
    OpenAICompatModel(
        provider="minimax-cn",
        id="MiniMax-M2.5",
        base_url="https://api.minimax.chat/v1/chat/completions",
        # api is optional here; inferred as "minimax-completions"
    )
)

Cross-provider tool-call smoke examples:

One-agent workflow: examples/example_tool_calls_three_providers.py
Raw Rust binding workflow (multi-turn tools): scripts/smoke_rust_tool_calls_three_providers.py
- Command: uv run python scripts/smoke_rust_tool_calls_three_providers.py

Limitations

Rust binding currently dispatches only openai-completions and minimax-completions.
Image blocks are not yet supported (text and thinking blocks work).
next_event() is blocking and runs in a thread via asyncio.to_thread -- this adds slight overhead compared to a native async generator, but keeps the GIL released during the Rust work.

Documentation

Architecture: System design and component interactions
API Reference: Detailed module documentation
Prompt Caching: Cache breakpoints, cost savings, and provider requirements
OpenAI-Compatible Endpoints: Using OpenAICompatModel.base_url with OpenRouter/OpenAI/Chutes-compatible backends
Usage Semantics: Canonical usage schema across provider flows
Changelog: Release history

Project Structure

tinyagent/
├── agent.py              # Agent class
├── agent_loop.py         # Core agent execution loop
├── agent_tool_execution.py  # Tool execution helpers
├── agent_types.py        # Type definitions
├── caching.py            # Prompt caching utilities
├── alchemy_provider.py   # Rust-based provider (PyO3)
├── proxy.py              # Proxy server integration
└── proxy_event_handlers.py  # Proxy event parsing

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.claude		.claude
docs		docs
examples		examples
rules		rules
scripts		scripts
src		src
static/images		static/images
tests		tests
tinyagent		tinyagent
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.vulture-whitelist.py		.vulture-whitelist.py
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
HARNESS.md		HARNESS.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
RESEARCH.md		RESEARCH.md
alchemy-minimax-bug-report.md		alchemy-minimax-bug-report.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyAgent

Overview

Quick Start

Installation

Core Concepts

Agent

Messages

Tools

Events

Prompt Caching

Rust Binding: `tinyagent._alchemy`

Why

How it works

Building

Python API

Using via TinyAgent (high-level)

Limitations

Documentation

Project Structure

About

Uh oh!

Releases 9

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

alchemiststudiosDOTai/tinyAgent

Folders and files

Latest commit

History

Repository files navigation

TinyAgent

Overview

Quick Start

Installation

Core Concepts

Agent

Messages

Tools

Events

Prompt Caching

Rust Binding: tinyagent._alchemy

Why

How it works

Building

Python API

Using via TinyAgent (high-level)

Limitations

Documentation

Project Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Rust Binding: `tinyagent._alchemy`

Packages