A semantic checkpointing layer for LLM agents that enables structured, versioned state management across sessions.
Medallion provides a framework-agnostic way to create, persist, and retrieve semantic checkpoints of LLM agent reasoning state. It treats reasoning state as a first-class artifact, allowing agents to resume work from previous sessions with full context.
- Structured Checkpoints: Versioned JSON-based medallion schema with meta, scope, summary, decisions, and open questions
- Framework Agnostic: Works with LangChain, LangGraph, custom agents, and any Python LLM framework
- SQLite Backend: Local file-based storage with efficient scope-based queries
- Scope-Based Retrieval: Query medallions by graph nodes and tags with subset matching
- Update Support: Update existing medallions with new evidence without duplication
git clone https://github.com/avimallick/medallion.git
cd medallion
pip install -e .- Python 3.11 or higher
- pydantic>=2.0
- aiosqlite>=0.19.0
from medallion import (
SQLiteMedallionStore,
StubMedallionLLM,
MedallionScope,
Evidence,
checkpoint_session,
load_medallions_for_scope,
)
# Initialize store and LLM
store = SQLiteMedallionStore("medallions.db")
llm = StubMedallionLLM()
# Define scope for this checkpoint
scope = MedallionScope(
graph_nodes=["repo:muse", "module:cli"],
tags=["project_state", "refactor"],
)
# Create checkpoint at session end
evidence = Evidence(
session_summary="Implemented CLI module with command routing",
transcripts=["User requested help command", "System showed help menu"],
artefacts={"commands": ["help", "start", "stop"]},
)
medallion = checkpoint_session(store, llm, scope, evidence)
print(f"Created medallion: {medallion.meta.medallion_id}")
# Load checkpoints for scope
medallions = load_medallions_for_scope(store, scope, limit=10)
print(f"Found {len(medallions)} medallions for scope")# Load most recent checkpoints for a scope
scope = MedallionScope(
graph_nodes=["repo:muse"],
tags=["project_state"],
)
medallions = load_medallions_for_scope(store, scope, limit=5)
for medallion in medallions:
print(f"{medallion.meta.medallion_id}: {medallion.summary.high_level}")
if medallion.decisions:
print(f" Decisions: {len(medallion.decisions)}")
if medallion.open_questions:
print(f" Open questions: {len(medallion.open_questions)}")# Subsequent checkpoint with same scope updates existing medallion
new_evidence = Evidence(
session_summary="Added error handling to CLI module",
transcripts=["User triggered error", "System handled gracefully"],
artefacts={"error_rate": 0.01},
)
# This will update the existing medallion, not create a new one
updated_medallion = checkpoint_session(store, llm, scope, new_evidence)
print(f"Updated medallion: {updated_medallion.meta.medallion_id}")
print(f"Created: {updated_medallion.meta.created_at}")
print(f"Updated: {updated_medallion.meta.updated_at}")A medallion is a structured checkpoint containing:
- Meta: ID, timestamps, schema version, model info
- Scope: Graph nodes and tags for querying
- Summary: High-level summary and subsystem breakdown
- Decisions: Key decisions made during reasoning
- Open Questions: Unresolved questions or blockers
- Affordances: Recommended entry points, things to avoid, invariants
Medallions are queried by scope using:
- Graph Nodes: Subset matching (requested nodes must be subset of stored nodes)
- Tags: Intersection matching (any tag overlap returns match)
Example:
# Stored medallion has: graph_nodes=["repo:muse", "module:cli", "module:store"]
# Query with: graph_nodes=["repo:muse", "module:cli"]
# ✅ Matches (requested nodes are subset of stored nodes)
# Stored medallion has: tags=["project_state", "refactor"]
# Query with: tags=["refactor"]
# ✅ Matches (tag overlap)Evidence is the input data for checkpoint creation:
- session_summary: High-level summary of the session
- transcripts: List of conversation or action transcripts
- artefacts: Dictionary of relevant artefacts (file paths, IDs, etc.)
from langchain.agents import AgentExecutor
from medallion import SQLiteMedallionStore, StubMedallionLLM, checkpoint_session
store = SQLiteMedallionStore("langchain_sessions.db")
llm = StubMedallionLLM()
# At session start: Load previous checkpoints
scope = MedallionScope(graph_nodes=["project:my-agent"], tags=["session"])
medallions = load_medallions_for_scope(store, scope)
# Inject context into agent
context = "\n".join([m.summary.high_level for m in medallions])
# Run agent
agent = AgentExecutor(...)
result = agent.run(context + "\nUser query: ...")
# At session end: Create checkpoint
evidence = Evidence(
session_summary=result["summary"],
transcripts=result["transcripts"],
artefacts=result["artefacts"],
)
checkpoint_session(store, llm, scope, evidence)from langgraph.graph import StateGraph
from medallion import SQLiteMedallionStore, checkpoint_session, load_medallions_for_scope
store = SQLiteMedallionStore("langgraph_sessions.db")
llm = StubMedallionLLM()
def on_session_start(state):
scope = MedallionScope(
graph_nodes=[state["project_id"]],
tags=["project_state"],
)
medallions = load_medallions_for_scope(store, scope)
state["medallions"] = medallions
return state
def on_session_end(state):
scope = MedallionScope(
graph_nodes=[state["project_id"]],
tags=["project_state"],
)
evidence = Evidence(
session_summary=state["summary"],
transcripts=state.get("transcripts", []),
artefacts=state.get("artefacts", {}),
)
checkpoint_session(store, llm, scope, evidence)
return state
# Build graph
graph = StateGraph(...)
graph.add_node("load_checkpoint", on_session_start)
graph.add_node("process", your_processing_node)
graph.add_node("save_checkpoint", on_session_end)from medallion import (
SQLiteMedallionStore,
MedallionLLM,
load_medallions_for_scope,
checkpoint_session,
MedallionScope,
Evidence,
)
class MyAgent:
def __init__(self, store: SQLiteMedallionStore, llm: MedallionLLM):
self.store = store
self.llm = llm
async def start_session(self, project_id: str):
"""Load medallions at session start."""
scope = MedallionScope(
graph_nodes=[f"repo:{project_id}"],
tags=["project_state"],
)
medallions = load_medallions_for_scope(self.store, scope)
# Inject medallion context into agent
context = self._format_medallions(medallions)
return context
async def end_session(
self,
project_id: str,
summary: str,
transcripts: list[str],
artefacts: dict,
):
"""Save checkpoint at session end."""
scope = MedallionScope(
graph_nodes=[f"repo:{project_id}"],
tags=["project_state"],
)
evidence = Evidence(
session_summary=summary,
transcripts=transcripts,
artefacts=artefacts,
)
return checkpoint_session(self.store, self.llm, scope, evidence)
def _format_medallions(self, medallions: list[Medallion]) -> str:
"""Format medallions for agent context."""
if not medallions:
return "No previous checkpoints found."
lines = ["Previous checkpoints:"]
for m in medallions:
lines.append(f"- {m.meta.medallion_id}: {m.summary.high_level}")
if m.decisions:
lines.append(f" Decisions: {', '.join(d.statement for d in m.decisions)}")
if m.open_questions:
lines.append(f" Open: {', '.join(q.question for q in m.open_questions)}")
return "\n".join(lines)Medallion: Main checkpoint data structureMedallionScope: Query scope with graph nodes and tagsEvidence: Input data for checkpoint creation
MedallionStore: Abstract interface for storageSQLiteMedallionStore: SQLite implementation
MedallionLLM: Abstract interface for LLM operationsStubMedallionLLM: Stub implementation for testing
checkpoint_session(store, llm, scope, evidence): Create or update checkpointload_medallions_for_scope(store, scope, limit=10): Load checkpoints for scope
MedallionError: Base exceptionSchemaValidationError: Schema validation failuresStoreError: Storage operation failuresLLMError: LLM operation failures
For async frameworks, use the async versions directly:
from medallion.session import _checkpoint_session_async, _load_medallions_for_scope_async
# Async checkpoint creation
medallion = await _checkpoint_session_async(store, llm, scope, evidence)
# Async loading
medallions = await _load_medallions_for_scope_async(store, scope, limit=10)Implement the MedallionLLM protocol to integrate with your LLM:
from medallion.llm import MedallionLLM
from medallion.types import Medallion, MedallionScope, Evidence
class MyMedallionLLM:
async def generate(self, scope: MedallionScope, evidence: Evidence) -> Medallion:
# Call your LLM API
# Parse response into Medallion
# Return medallion
pass
async def update(self, existing: Medallion, new_evidence: Evidence) -> Medallion:
# Call your LLM API to merge evidence
# Update existing medallion
# Return updated medallion
passasync with SQLiteMedallionStore("medallions.db") as store:
llm = StubMedallionLLM()
scope = MedallionScope(graph_nodes=["repo:muse"], tags=["test"])
evidence = Evidence(session_summary="Test session")
medallion = await _checkpoint_session_async(store, llm, scope, evidence)
# Database connection automatically closed on exitpytestWith coverage:
pytest --cov=medallion --cov-report=htmlmypy medallionruff check medallion
black medallionMedallion is organized into modular, composable components:
medallion/
├── types.py # Core data models (Medallion, MedallionScope, Evidence)
├── store.py # Storage interface (MedallionStore Protocol)
├── sqlite_store.py # SQLite storage implementation
├── llm.py # LLM interface and stub implementation
└── session.py # High-level session helpers (checkpoint, load)
┌─────────────────────────────────────────────────────────────┐
│ Session Layer │
│ (checkpoint_session, load_medallions_for_scope) │
│ ┌──────────────────┐ │
│ │ session.py │ │
│ └────────┬─────────┘ │
└───────────────────────────┼──────────────────────────────────┘
│
┌───────────────────┴───────────────────┐
│ │
┌───────▼────────┐ ┌────────▼──────┐
│ Store Layer │ │ LLM Layer │
│ (SQLiteStore) │ │ (MedallionLLM)│
│ │ │ │
│ ┌──────────┐ │ │ ┌──────────┐ │
│ │store.py │ │ │ │ llm.py │ │
│ └────┬─────┘ │ │ └────┬─────┘ │
│ │ │ │ │ │
│ ┌────▼─────┐ │ │ │ │
│ │sqlite_ │ │ │ │ │
│ │store.py │ │ │ │ │
│ └────┬─────┘ │ │ │ │
└───────┼────────┘ └───────┼───────┘
│ │
│ ┌──────────────────┘
│ │
┌───────▼────────────────────▼─────────┐
│ Data Layer │
│ (Medallion, Evidence, etc.) │
│ │
│ ┌──────────────────────────────┐ │
│ │ types.py │ │
│ │ - Medallion (main schema) │ │
│ │ - MedallionScope │ │
│ │ - Evidence │ │
│ │ - Exceptions │ │
│ └──────────────────────────────┘ │
└──────────────────────────────────────┘
-
Checkpoint Creation:
Evidence → LLM.generate() → Medallion → Store.create() -
Checkpoint Update:
Evidence → Store.get_latest_for_scope() → LLM.update() → Store.update() -
Checkpoint Retrieval:
MedallionScope → Store.get_latest_for_scope() → List[Medallion]
- Dependency Injection: All dependencies (store, LLM) are injected, not hardcoded
- Protocol-Based: Interfaces defined as
typing.Protocolfor flexibility - Async-First: All I/O operations are async, with sync wrappers for convenience
- Framework-Agnostic: No framework-specific dependencies in core library
- Type-Safe: Strict typing with mypy, Pydantic for runtime validation
MIT License - see LICENSE file for details.
Contributions welcome! Please see CONTRIBUTING.md for guidelines.