Event-sourced observability for AI agents.
Logs tell you that an agent ran. Ledge tells you why it said what it said. It captures the full cognitive lifecycle of an AI agent, the assembled context window, the inference request, the model's reasoning, the tool calls, the final response, as immutable, timestamped events. Once captured, you can reconstruct any past context window byte-for-byte, diff two inferences to see what changed between them, and walk a complete audit trail for any session.
This matters in regulated work. Finance, legal, healthcare, and insurance teams can't deploy an agent they can't explain six months later when a regulator asks. "Why did the model recommend this on April 14th, given what it knew at 14:32:07?" is unanswerable from request/response logs alone. Ledge is built around that question.
The system is event-sourced on purpose: Kafka is the source of truth, and every storage engine downstream (ClickHouse, Postgres, Redis) is a projection that can be rebuilt from the log. Nothing in the cognitive trace is ever mutated in place.
The SDK batches events client-side and sends them to the ingest API, which validates and publishes to Kafka. Two consumers fan out: one writes the immutable audit record to ClickHouse, the other materialises session state into Postgres and warms Redis.
flowchart LR
App[AI Agent App] --> SDK[ledge-sdk<br/>batch + SHA-256 context hash]
SDK -->|POST /api/v1/events/batch| API[Ingest API<br/>Spring WebFlux]
API --> K[(Kafka<br/>immutable event log)]
K --> CH_C[Audit consumer]
K --> ST_C[State consumer]
CH_C --> CH[(ClickHouse<br/>columnar audit store)]
ST_C --> PG[(Postgres<br/>session + tenant state)]
ST_C --> R[(Redis<br/>hot session cache)]
Queries hit Redis first for active sessions. On miss, Postgres serves session metadata and ClickHouse handles the heavy time-range scan needed for point-in-time reconstruction or context diffing.
flowchart LR
Client[Client / Auditor] -->|GET /api/v1/memory/...| Q[Query API]
Q --> R{Active session<br/>in Redis?}
R -->|hit| Resp[Response]
R -->|miss| PG[(Postgres)]
PG --> CH[(ClickHouse<br/>event range scan)]
CH --> Resp
A single docker compose up brings up the application and all of its dependencies, plus the observability stack. Prometheus scrapes the Spring Boot Actuator endpoint; Grafana ships pre-provisioned with an operations dashboard.
flowchart TB
subgraph Application
L[ledge-server<br/>:8080]
end
subgraph Data
K[(Kafka<br/>:9092)]
PG[(Postgres 16<br/>:5432)]
CH[(ClickHouse 24.3<br/>:8123)]
RD[(Redis 7<br/>:6379)]
end
subgraph Observability
P[Prometheus<br/>:9090]
G[Grafana<br/>:3000]
end
L <--> K
L --> PG
L --> CH
L --> RD
P -->|scrape /actuator/prometheus| L
G --> P
| Path | What's there |
|---|---|
ledge-server/ |
Spring Boot service: ingest API, query API, Kafka consumers, persistence |
ledge-sdk/ |
JVM client library |
observability/ |
Prometheus config and Grafana dashboards (auto-provisioned) |
infra/ |
Postgres and ClickHouse init SQL, schema migrations |
cp .env.example .env # set POSTGRES_PASSWORD and GRAFANA_PASSWORD
docker compose up
curl http://localhost:8080/actuator/healthThat brings up the API on :8080, Grafana on :3000, and the rest of the stack on the ports shown in the topology diagram above.
To use the SDK from another local project, publish it to your Maven Local first:
./gradlew :ledge-sdk:publishToMavenLocalval ledge = LedgeClient(LedgeConfig(baseUrl = "http://localhost:8080", apiKey = "your-key"))
val session = ledge.createSession(agentId = "your-agent-uuid")
session.userInput("What is the refund policy?")
session.contextAssembled(listOf(
ContentBlock("system", "You are helpful"),
ContentBlock("user", "What is the refund policy?"),
))
val infId = session.inferenceRequested("gpt-4o", "openai")
session.inferenceCompleted("Our policy...", TokenUsage(50, 87, 137), infId)
session.agentOutput("Our policy...", infId)
ledge.completeSession(session.sessionId)
ledge.close()Events are batched (50 events or 100ms, whichever comes first) and retried with exponential backoff. The SDK computes the SHA-256 of each assembled context so identical contexts deduplicate and any drift is detectable. Full API reference lives in ledge-sdk/README.md.
./gradlew test # unit tests across all modules
./gradlew :ledge-server:integrationTest # spins up Testcontainers (needs Docker)
./gradlew :ledge-sdk:test # SDK onlyIntegration tests run against real Kafka, Postgres, ClickHouse, and Redis containers — no mocks at the persistence boundary.
Grafana on :3000 ships with an operations dashboard covering ingest throughput, query latency by type, ClickHouse and Redis timings, write failures, and active session count. Prometheus scrapes /actuator/prometheus every 15 seconds with 15-day retention.
Single-author project. The Observation Layer (event capture, point-in-time reconstruction, context diffing, audit queries) is implemented end-to-end. The Knowledge Layer, semantic indexing and retrieval over captured traces in progress.