ledge

Event-sourced observability for AI agents.

What it is

Logs tell you that an agent ran. Ledge tells you why it said what it said. It captures the full cognitive lifecycle of an AI agent, the assembled context window, the inference request, the model's reasoning, the tool calls, the final response, as immutable, timestamped events. Once captured, you can reconstruct any past context window byte-for-byte, diff two inferences to see what changed between them, and walk a complete audit trail for any session.

This matters in regulated work. Finance, legal, healthcare, and insurance teams can't deploy an agent they can't explain six months later when a regulator asks. "Why did the model recommend this on April 14th, given what it knew at 14:32:07?" is unanswerable from request/response logs alone. Ledge is built around that question.

The system is event-sourced on purpose: Kafka is the source of truth, and every storage engine downstream (ClickHouse, Postgres, Redis) is a projection that can be rebuilt from the log. Nothing in the cognitive trace is ever mutated in place.

Architecture

Write path

The SDK batches events client-side and sends them to the ingest API, which validates and publishes to Kafka. Two consumers fan out: one writes the immutable audit record to ClickHouse, the other materialises session state into Postgres and warms Redis.

flowchart LR
  App[AI Agent App] --> SDK[ledge-sdk<br/>batch + SHA-256 context hash]
  SDK -->|POST /api/v1/events/batch| API[Ingest API<br/>Spring WebFlux]
  API --> K[(Kafka<br/>immutable event log)]
  K --> CH_C[Audit consumer]
  K --> ST_C[State consumer]
  CH_C --> CH[(ClickHouse<br/>columnar audit store)]
  ST_C --> PG[(Postgres<br/>session + tenant state)]
  ST_C --> R[(Redis<br/>hot session cache)]

Read path

Queries hit Redis first for active sessions. On miss, Postgres serves session metadata and ClickHouse handles the heavy time-range scan needed for point-in-time reconstruction or context diffing.

flowchart LR
  Client[Client / Auditor] -->|GET /api/v1/memory/...| Q[Query API]
  Q --> R{Active session<br/>in Redis?}
  R -->|hit| Resp[Response]
  R -->|miss| PG[(Postgres)]
  PG --> CH[(ClickHouse<br/>event range scan)]
  CH --> Resp

Deployment topology

A single docker compose up brings up the application and all of its dependencies, plus the observability stack. Prometheus scrapes the Spring Boot Actuator endpoint; Grafana ships pre-provisioned with an operations dashboard.

flowchart TB
  subgraph Application
    L[ledge-server<br/>:8080]
  end
  subgraph Data
    K[(Kafka<br/>:9092)]
    PG[(Postgres 16<br/>:5432)]
    CH[(ClickHouse 24.3<br/>:8123)]
    RD[(Redis 7<br/>:6379)]
  end
  subgraph Observability
    P[Prometheus<br/>:9090]
    G[Grafana<br/>:3000]
  end
  L <--> K
  L --> PG
  L --> CH
  L --> RD
  P -->|scrape /actuator/prometheus| L
  G --> P

Repository layout

Path	What's there
`ledge-server/`	Spring Boot service: ingest API, query API, Kafka consumers, persistence
`ledge-sdk/`	JVM client library
`observability/`	Prometheus config and Grafana dashboards (auto-provisioned)
`infra/`	Postgres and ClickHouse init SQL, schema migrations

Quick start

cp .env.example .env       # set POSTGRES_PASSWORD and GRAFANA_PASSWORD
docker compose up
curl http://localhost:8080/actuator/health

That brings up the API on :8080, Grafana on :3000, and the rest of the stack on the ports shown in the topology diagram above.

To use the SDK from another local project, publish it to your Maven Local first:

./gradlew :ledge-sdk:publishToMavenLocal

Instrumenting an agent

val ledge = LedgeClient(LedgeConfig(baseUrl = "http://localhost:8080", apiKey = "your-key"))
val session = ledge.createSession(agentId = "your-agent-uuid")

session.userInput("What is the refund policy?")
session.contextAssembled(listOf(
    ContentBlock("system", "You are helpful"),
    ContentBlock("user", "What is the refund policy?"),
))
val infId = session.inferenceRequested("gpt-4o", "openai")
session.inferenceCompleted("Our policy...", TokenUsage(50, 87, 137), infId)
session.agentOutput("Our policy...", infId)

ledge.completeSession(session.sessionId)
ledge.close()

Events are batched (50 events or 100ms, whichever comes first) and retried with exponential backoff. The SDK computes the SHA-256 of each assembled context so identical contexts deduplicate and any drift is detectable. Full API reference lives in ledge-sdk/README.md.

Tests

./gradlew test                               # unit tests across all modules
./gradlew :ledge-server:integrationTest      # spins up Testcontainers (needs Docker)
./gradlew :ledge-sdk:test                    # SDK only

Integration tests run against real Kafka, Postgres, ClickHouse, and Redis containers — no mocks at the persistence boundary.

Observability

Grafana on :3000 ships with an operations dashboard covering ingest throughput, query latency by type, ClickHouse and Redis timings, write failures, and active session count. Prometheus scrapes /actuator/prometheus every 15 seconds with 15-day retention.

Status

Single-author project. The Observation Layer (event capture, point-in-time reconstruction, context diffing, audit queries) is implemented end-to-end. The Knowledge Layer, semantic indexing and retrieval over captured traces in progress.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
gradle/wrapper		gradle/wrapper
infra		infra
ledge-sdk		ledge-sdk
ledge-server		ledge-server
observability		observability
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
build.gradle.kts		build.gradle.kts
docker-compose.yml		docker-compose.yml
gradlew		gradlew
gradlew.bat		gradlew.bat
manual.md		manual.md
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ledge

What it is

Architecture

Write path

Read path

Deployment topology

Repository layout

Quick start

Instrumenting an agent

Tests

Observability

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ledge

What it is

Architecture

Write path

Read path

Deployment topology

Repository layout

Quick start

Instrumenting an agent

Tests

Observability

Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages