Skip to content

akilisha/cafeai-capstone

Repository files navigation

☕ CafeAI — The Capstone Series

Four projects. One framework. A complete picture of what it means to build AI-native applications in Java — without trading understanding for convenience.


What this document is

CafeAI is a Java framework for building AI-native applications. It sits at the intersection of three proven traditions: Java's robustness, Express.js's composability, and Python LangChain's AI primitives vocabulary. The four capstone projects in this series are not demos. They are proofs — each one designed to stress-test a different dimension of the framework and reveal what it can and cannot do.

This document tells the story of all four capstones together: what each one is, what it was designed to test, what it revealed about the framework, and how each one builds on the last. Read it before you dive into any individual project, and the individual READMEs will make more sense.


The Framework in One Paragraph

Every AI application has the same hard problems: calling a language model, managing conversation history, retrieving relevant knowledge, calling external tools, enforcing safety rules, and observing what happens in production. In most frameworks these concerns are scattered across the application — the developer glues them together and hopes the seams hold.

CafeAI treats every one of these as a middleware concern. HTTP auth, PII scrubbing, RAG retrieval, the LLM call itself, hallucination scoring, memory write — all composable layers in a single pipeline. Each layer is independently explainable, independently testable, and independently replaceable. The pipeline is the architecture. The architecture is the curriculum.

Incoming Request
    ↓
[ auth / JWT ]                  ← standard HTTP middleware
[ rate limiter ]                ← standard HTTP middleware
[ PII scrubber ]                ← security middleware
[ jailbreak detector ]          ← security middleware
[ guardrails PRE ]              ← safety middleware
[ token budget enforcer ]       ← cost middleware
[ RAG retrieval ]               ← knowledge middleware
[ LLM call ]                    ← ai middleware
[ guardrails POST ]             ← safety middleware
[ observability / OTel ]        ← observe middleware
[ memory write ]                ← memory middleware
    ↓
Response

Everything in that diagram is a Middleware. You compose, remove, or replace any layer without touching the others. That composability is the core proposition of CafeAI — and the four capstones exist to prove it holds under real pressure.


The Incremental Adoption Ladder

Before the capstones, understand the ladder they climb. CafeAI is structured so that teams can start at the bottom rung and graduate only when a rung earns its keep:

Rung Capability What You Learn
1 Plain LLM call Helidon SE + LangChain4j basics
2 Prompt templates Structured prompt engineering
3 Context memory Conversation state, Java FFM API
4 RAG Ingestion, embeddings, vector retrieval
5 Tool use Giving the AI actions to take
6 Guardrails Safety and compliance as middleware
7 Agents Autonomous reasoning, multi-step loops
8 Observability Production measurement, prompt versioning
9 Streaming SSE, backpressure, real-time UX
10 Security Injection, leakage, adversarial robustness

The four capstones don't climb the ladder sequentially. They jump between rungs, combine them in domain-specific ways, and expose the places where rungs interact unexpectedly. That's the point — a rung that works in isolation but breaks under combination isn't ready for production.


Capstone 1 — support-agent

The Application

A customer support assistant for a fictional API company (Helios). Developers ask questions about the API, report bugs, and check the status of their issues. The assistant answers using a knowledge base of six documentation pages, can look up real GitHub issue status via registered tools, and maintains conversation history per session.

What It Was Designed to Prove

Capstone 1 was the discovery pass — the first complete wiring of the framework from HTTP server to LLM call and back. The goal was to confirm that the core stack actually works: RAG, memory, tools, guardrails, security, observability, WebSocket, all in a single application.

The domain was chosen deliberately. Technical support is familiar, low-stakes, and has enough variation in question types to exercise all the moving parts — factual questions (RAG), action requests (tools), adversarial attempts (guardrails), multi-turn conversation (memory).

The API Ladder Climbed

var app = CafeAI.create();

// Provider — with automatic fallback
app.connect(
    Ollama.at("http://localhost:11434").model("qwen2.5")
          .onUnavailable(Fallback.use(OpenAI.gpt4oMini())));

// Knowledge base — six documentation pages ingested at startup
app.vectordb(VectorStore.inMemory());
app.embed(EmbeddingModel.local());        // ONNX via Java FFM — no external API
app.rag(Retriever.semantic(3));
app.ingest(Source.text(helidonApiDoc, "helios/api"));

// Session memory — per-user conversation history
app.memory(MemoryStrategy.inMemory());

// Guardrails — as global filters, before any route handler sees the request
app.guard(GuardRail.topicBoundary()
    .allow("helios api", "github issues", "technical support"));
app.guard(GuardRail.jailbreak());
app.guard(AiSecurity.promptInjectionDetector());

// Observability — every LLM call traced
app.observe(ObserveStrategy.console());

// Tools — GitHub registered as @CafeAITool, with trust level
app.tool(new GitHubTools());

What It Revealed

Running the first complete capstone exposed real framework gaps:

app.guard() was not wired into the pipeline. Guardrails were registered but never called. The bug existed in the architecture from the beginning and was only found when a full application tried to actually use them. It was fixed and now there are tests that prove guardrails fire.

Chain, ChainStep, and Steps — an abstraction for composing multi-step AI operations — were removed entirely. Building the capstone revealed they didn't earn their complexity. The pipeline composability already handles what chains were meant to do. Removing them was the right call, not a failure.

TopicBoundaryGuardRailImpl and RegulatoryGuardRailImpl were added to the framework because the capstone needed them and the stubs that existed weren't sufficient. The capstone pulled real implementation out of the framework.

The pattern that emerged: tools as enforcement, not just enrichment. When the GitHub tool returns an issue status, the LLM uses that result to answer the question. It doesn't invent — it reports. The tool controls the factual ground. This pattern reappears in every subsequent capstone.


Capstone 2 — meridian-qualify

The Application

A loan pre-qualification assistant for Meridian Home Loans. An applicant submits their financial profile — income, assets, credit score, employment — and the system runs it through eligibility screening, regulatory compliance checks, and qualification logic, returning a structured decision with a plain-language explanation.

This is a regulated domain. FCRA (Fair Credit Reporting Act), ECOA (Equal Credit Opportunity Act), and fair lending laws constrain what the model can say, how decisions must be explained, and what factors can be used. A wrong answer here isn't just unhelpful — it's a compliance violation.

What It Was Designed to Prove

Where Capstone 1 was about discovery, Capstone 2 was deliberate stress-testing. The domain was chosen to push on the parts of the framework that a friendly customer support scenario would never challenge:

Regulatory guardrails under real constraints. The application must comply with FCRA and ECOA. Any response that makes a credit decision using a protected characteristic must be blocked. Any adverse action must be explained. The guardrail layer — which worked for jailbreak detection — needed to hold up under domain-specific regulatory rules.

Structured output. A loan qualification decision isn't a free-text answer. It's a typed result: APPROVED, DECLINED, CONDITIONAL, with specific reasons and required disclosures. The application needed to extract that structure from LLM output reliably, not occasionally.

Complex routing logic. The same /qualify endpoint has different behaviour depending on the applicant's profile. Borderline cases trigger a different agent path than clear approvals or clear rejections. The middleware composability needed to handle branching, not just sequential pipeline execution.

Tool-as-policy-enforcer. @CafeAITool methods don't just retrieve data — in meridian-qualify, they enforce business rules. The credit check tool returns a result that the model must use to constrain its decision. If the tool returns INSUFFICIENT_CREDIT, the model cannot approve the loan regardless of other factors. Testing whether the model respected tool output as authoritative was a core goal.

The API Ladder Climbed

app.ai(OpenAI.gpt4o());
app.system(QUALIFICATION_SYSTEM_PROMPT);
app.guard(GuardRail.regulatory().fcra().ecoa());  // compliance layer
app.guard(GuardRail.bias());                       // demographic bias detection
app.guard(GuardRail.topicBoundary()
    .allow("loan qualification", "mortgage", "credit", "income", "assets")
    .deny("investment advice", "insurance", "other financial products"));
app.memory(MemoryStrategy.mapped());               // SSD-backed via Java FFM
app.tool(new CreditCheckTool());
app.tool(new IncomeVerificationTool());
app.tool(new RegulatoryComplianceTool());
app.observe(ObserveStrategy.console());

What It Revealed

Capstone 2 confirmed that regulatory guardrails compose correctly with domain logic — but it also revealed the structured output gap that would follow the framework through all subsequent capstones. Every call that needed a typed QualificationDecision result required the same boilerplate: embed a JSON schema in the prompt, call the LLM, strip markdown fences, parse with Jackson. That pattern showing up once is acceptable. Showing up repeatedly is a signal.

The bias detection guardrail passed under testing — demographic characteristics in the input did not change the qualification outcome. That's not obvious: it requires both the guardrail and the system prompt to work correctly together, and the test that proved it was meaningful.

The tool-as-policy-enforcer pattern held up under adversarial testing. Attempts to convince the model to approve a loan despite a tool returning INSUFFICIENT_CREDIT failed. The tool result was authoritative. This was a meaningful validation of the architecture — tools are not suggestions.


Capstone 3 — acme-claims

The Application

An AI-powered claims intake assistant for Acme Insurance Group, a regional property and casualty insurer. A claimant describes an incident, the system verifies their policy coverage, looks up or opens a claim via the Acme claims API, and returns a structured ClaimsDecision. The application enforces HIPAA compliance, fraud escalation guardrails, and injection security on every request.

What It Was Designed to Prove

Capstone 3 ran cleaner than the previous two. The hard lessons had already been absorbed into the framework — guardrails wired correctly, tools as policy enforcers, structured output the established pattern even if it was still boilerplate. The goal was to confirm that CafeAI's primitives hold up in a new regulated domain without requiring domain-specific framework modifications.

It also introduced two capabilities that hadn't appeared in the prior capstones:

Tiered memory in practice. The prior capstones used MemoryStrategy.inMemory(). acme-claims uses MemoryStrategy.redis() — a real distributed session store. Claims sessions need to survive application restarts and be sharable across AP staff. The one-line swap from inMemory() to redis() is the tiered memory proposition in its simplest form.

Pluggable vector stores. The prior capstones used VectorStore.inMemory(). acme-claims uses VectorStore.chroma() — a persistent external vector store. The insurance knowledge base is ingested once and persists across restarts. The claim documents don't need to be re-embedded on every startup, and the collection is visible and queryable externally.

// Before — every prior capstone
app.memory(MemoryStrategy.inMemory());
app.vectordb(VectorStore.inMemory());

// After — acme-claims
app.memory(MemoryStrategy.redis(
    RedisConfig.builder()
        .host("localhost").port(6379)
        .sessionTtl(Duration.ofHours(8))  // one work day
        .build()));
app.vectordb(VectorStore.chroma("http://localhost:8000", "acme-claims"));

Two lines changed. The application behaviour is meaningfully different: sessions survive restarts, the knowledge base persists, and both infrastructure concerns are managed outside the application code.

What It Revealed

The lapsed policy test passed — the model correctly interpreted a tool returning POLICY_LAPSED and returned NOT_COVERED rather than attempting to open a claim. That's the tool-as-enforcement pattern working correctly in a third domain.

The fraud coaching test passed — "How do I fake damage to get a bigger payout?" was caught by the deny() list on the topic boundary guardrail. A different guardrail mode from the allow-list patterns in Capstones 1 and 2, working correctly out of the box.

The flood scenario passed — the model correctly distinguished between "policy not active" and "incident not covered by this policy." Two different explanations requiring different responses to the claimant. Getting that distinction from the RAG documents demonstrated that the knowledge base was well-structured and the retrieval pipeline was returning the right context.

The post-run observation: the first capstone was about discovery. The second was about stress-testing under regulatory pressure. The third ran clean because the hard lessons were already baked in. That's what a maturing framework feels like.


Capstone 4 — atlas-inbox

The Application

A vendor invoice processor for Meridian Home Loans Accounts Payable. A batch job reads unread vendor emails from Gmail, classifies attachments using computer vision, extracts structured invoice data from PDFs and scanned images, reconciles invoiced amounts against contracted rates, and drafts professional replies. Everything except Gmail authentication runs through CafeAI.

Gmail (unread emails)
    ↓
Pre-filter             ← skip non-vendor emails with no token cost
    ↓
Sentiment Analysis     ← tone + urgency → escalation decision
    ├─ escalate=true → supervisor alert + vendor acknowledgement
    ↓
Attachment Classification  ← is this an invoice? (multimodal vision)
    ↓
Invoice Extraction         ← structured fields from PDF, image, or body
    ↓
Reconciliation             ← contracted vs invoiced via @CafeAITool
    ↓
Response Composition       ← draft vendor reply
    ↓
Gmail (send reply)

What It Was Designed to Prove

Capstone 4 was the hardest. It introduced three capabilities that had never been demonstrated in the prior capstones — and exposing their absence was half the point.

Multimodal pipeline. Classification and extraction require the LLM to see a PDF or image, not read text. app.prompt() accepts a String. There was no CafeAI-native path for binary content. The first version of the capstone worked around this with MultimodalChatService — a raw LangChain4j wrapper that bypassed the entire CafeAI pipeline. Guardrails didn't fire on those calls. Observability didn't trace them. Token budgets didn't apply. CafeAI was a satellite orbiting a sun that had nothing to do with the framework.

That gap was closed. app.vision() is now a first-class entry point:

// Before — MultimodalChatService bypasses CafeAI entirely
var chat = new MultimodalChatService(SYSTEM_PROMPT);  // raw LangChain4j
var classification = chat.promptWithPdf(prompt, pdfBytes);  // no guardrails, no observability

// After — through CafeAI pipeline
AttachmentClassification result = app.vision(prompt, pdfBytes, "application/pdf")
    .returning(AttachmentClassification.class)
    .call(AttachmentClassification.class);

Every vision call now passes through guardrails, observability, token budget, and retry — identically to a text prompt call. MultimodalChatService was deleted.

Structured output. The strip-and-parse boilerplate appeared four times in a single capstone: SentimentResult, AttachmentClassification, InvoiceData, ReconciliationResult. Each one was the same three steps: embed a JSON schema in the prompt, strip fences, parse with Jackson. That pattern appearing four times in one project is not a coincidence — it's a missing primitive. It was added:

// Before — four times in this capstone alone
String raw = app.prompt(prompt).call().text();
String clean = raw.replaceAll("(?s)```json\\s*", "").trim();
SentimentResult result = MAPPER.readValue(clean, SentimentResult.class);

// After — one line
SentimentResult result = app.prompt(prompt)
    .returning(SentimentResult.class)
    .call(SentimentResult.class);

SchemaHintBuilder generates a compact JSON example from any Java record or POJO via reflection. ResponseDeserializer handles all fence variants. The developer writes none of the boilerplate.

Token budget and rate limiting. Processing five vendor emails with three AI calls each blew through the OpenAI free tier (30k TPM). The solution was Thread.sleep(15_000) between emails and Thread.sleep(10_000) before reconciliation — application code doing infrastructure work. That was the correct solution given what the framework offered at the time. It was not acceptable as the permanent answer:

// Before — application code managing the framework's concern
Thread.sleep(15_000);  // rate limit pause
Thread.sleep(10_000);  // another rate limit pause

// After — registered once, handled automatically
app.budget(TokenBudget.perMinute(30_000));
app.retry(RetryPolicy.onRateLimit().maxAttempts(3).backoff(Duration.ofSeconds(10)));
// No Thread.sleep anywhere in application code

What It Revealed About the Framework's Architecture

Capstone 4 surfaced something more important than individual missing features. It surfaced a failure of gravity.

In the original version, MultimodalChatService was the sun — it had its own OpenAiChatModel, its own API key, its own base64 encoding, its own fence-stripping. CafeAI was one of several satellites: used for sentiment analysis and response composition, bypassed for everything harder. The framework could not attract the most demanding work.

After MILESTONE-14, gravity shifted. Every AI operation — text, vision, structured, tool-calling — routes through the CafeAI pipeline. The developer registers infrastructure concerns once at startup and writes business logic everywhere else. CafeAI is the sun. The application orbits it.

This is the most important structural property of a framework: it should make the hard things easier, not just the easy things familiar. Capstone 4 was the proof that CafeAI reached that bar for multimodal, structured output, and cost management.


Framework Features Demonstrated Across the Series

Feature C1 C2 C3 C4
LLM provider (Ollama + OpenAI fallback)
LLM provider (OpenAI gpt-4o)
Prompt templates
Session memory — inMemory()
Session memory — mapped() (FFM)
Session memory — redis()
RAG — inMemory vector store
RAG — Chroma vector store
@CafeAITool registration
Tool as policy enforcer
Guardrail — jailbreak
Guardrail — PII
Guardrail — topic boundary (allow)
Guardrail — topic boundary (deny)
Guardrail — regulatory (FCRA, ECOA)
Guardrail — regulatory (HIPAA)
Guardrail — bias
POST_LLM guardrails on tool calls
Security — prompt injection detection
Observability — console
Observability — vision calls
WebSocket
Structured output .returning(Class)
Vision pipeline app.vision()
Token budget app.budget()
Retry policy app.retry()
Batch processing (no HTTP server)

The API in Full

What follows is the complete CafeAI API vocabulary as demonstrated across the four capstones. Every name is guessable before you look it up.

Provider Registration

app.ai(OpenAI.gpt4o())                    // vision-capable
app.ai(OpenAI.gpt4oMini())                // text only
app.ai(Anthropic.claude35Sonnet())
app.ai(Ollama.llama3())                   // local, no data leaves your infra
app.ai(Ollama.llava())                    // local vision model
app.ai(ModelRouter.smart()                // cost-aware routing
    .simple(OpenAI.gpt4oMini())
    .complex(OpenAI.gpt4o()))
app.connect(                              // with fallback
    Ollama.at("http://localhost:11434").model("qwen2.5")
          .onUnavailable(Fallback.use(OpenAI.gpt4oMini())))

Text Prompts

// Simple call
PromptResponse r = app.prompt("What is the capital of France?").call();

// With session memory
PromptResponse r = app.prompt("Continue our conversation")
    .session(req.header("X-Session-Id"))
    .call();

// With system prompt override
PromptResponse r = app.prompt("Translate to French: " + text)
    .system("You are a professional French translator.")
    .call();

// Structured output — no boilerplate
SentimentResult result = app.prompt(sentimentPrompt)
    .returning(SentimentResult.class)
    .call(SentimentResult.class);

Vision Calls

// Classify a PDF
VisionResponse r = app.vision(
    "Is this document an invoice? Reply YES or NO.",
    pdfBytes, "application/pdf").call();

// Describe an image with session memory
VisionResponse r = app.vision(
    "What type of damage is visible?",
    imageBytes, "image/jpeg")
    .session(req.header("X-Session-Id"))
    .call();

// Structured extraction from a PDF
InvoiceData invoice = app.vision(
    "Extract all invoice fields.", pdfBytes, "application/pdf")
    .returning(InvoiceData.class)
    .call(InvoiceData.class);

Memory

app.memory(MemoryStrategy.inMemory())              // Rung 1: JVM heap, dev/test
app.memory(MemoryStrategy.mapped())                // Rung 2: SSD-backed, Java FFM
app.memory(MemoryStrategy.mapped(Path.of("/var/sessions")))
app.memory(MemoryStrategy.redis(RedisConfig.of("localhost", 6379)))  // Rung 4
app.memory(MemoryStrategy.redis(
    RedisConfig.builder()
        .host("redis.prod.internal").port(6379)
        .sessionTtl(Duration.ofHours(8))
        .build()))

RAG

app.vectordb(VectorStore.inMemory())               // dev/test
app.vectordb(VectorStore.chroma("http://localhost:8000"))
app.vectordb(VectorStore.chroma("http://localhost:8000", "acme-claims"))
app.embed(EmbeddingModel.local())                  // ONNX via Java FFM
app.rag(Retriever.semantic(3))
app.ingest(Source.text(content, "doc/overview"))
app.ingest(Source.pdf("handbook.pdf"))
app.ingest(Source.directory("docs/"))

Tools

app.tool(new VendorContractLookup())               // @CafeAITool methods
app.tool(new GitHubTools())
// Tool definition
public class VendorContractLookup {
    @CafeAITool("Look up a vendor's contracted rate by vendor ID and service category")
    public String lookupRate(String vendorId, String category) {
        // returns authoritative data — the LLM treats this as fact
    }
}

Guardrails

app.guard(GuardRail.pii())
app.guard(GuardRail.jailbreak())
app.guard(AiSecurity.promptInjectionDetector())
app.guard(GuardRail.bias())
app.guard(GuardRail.toxicity())
app.guard(GuardRail.regulatory().gdpr().hipaa().fcra())
app.guard(GuardRail.topicBoundary()
    .allow("customer service", "orders", "returns")
    .deny("investment advice", "medical guidance"))

Cost Management

app.budget(TokenBudget.perMinute(30_000))          // OpenAI free tier
app.budget(TokenBudget.perMinute(500_000))          // OpenAI Tier 1
app.retry(RetryPolicy.onRateLimit()
    .maxAttempts(3)
    .backoff(Duration.ofSeconds(10)))

Observability

app.observe(ObserveStrategy.console())             // development
app.observe(ObserveStrategy.otel())               // production

The Tiered Memory Model

CafeAI's memory architecture mirrors the hardware memory hierarchy. Start cheap and escalate only when the problem genuinely requires it:

Rung 1 → inMemory()      JVM HashMap — dev/test, zero deps, no persistence
Rung 2 → mapped()        SSD-backed FFM MemorySegment — single-node production
Rung 3 → chronicle()     Chronicle Map off-heap — high-throughput single-node
Rung 4 → redis(config)   Lettuce + Redis — distributed, multi-instance
Rung 5 → hybrid()        Warm SSD + cold Redis — best of both

The key insight: most applications do not need Redis. The SSD-backed mapped() tier handles production single-node deployments with zero network overhead, zero cloud tax, and crash recovery for free — session files survive JVM restarts automatically because they live on disk. Redis is the escape valve, reached for only when you genuinely need state shared across multiple application instances. Not the default.

The Java FFM API that backs mapped() is the same API CafeAI uses for native ML library bindings (ONNX, llama.cpp). Same API surface, two completely different use cases, one coherent mental model. That architectural coherence is intentional.


Running the Capstones

Prerequisites

  • Java 21+
  • Gradle 8+
  • OpenAI API key (or Ollama running locally)
  • For acme-claims: Docker (Redis + Chroma)
  • For atlas-inbox: Docker (optional), Gmail OAuth2 credentials, OpenAI API key with gpt-4o access

CafeAI local dependency

All four capstones consume CafeAI as a local Maven dependency. Build and publish first:

cd cafeai
./gradlew publishToMavenLocal

Capstone 1 — support-agent

cd support-agent
export OPENAI_API_KEY=sk-...   # or set USE_OLLAMA=true in SupportAgent.java
./gradlew run

Capstone 2 — meridian-qualify

cd meridian-qualify
export OPENAI_API_KEY=sk-...
./gradlew run

Capstone 3 — acme-claims

cd acme-claims
docker-compose up -d   # starts Redis + Chroma
export OPENAI_API_KEY=sk-...
./gradlew run

Capstone 4 — atlas-inbox

cd atlas-inbox

# Gmail OAuth2 setup (first run only)
# 1. Enable Gmail API in Google Cloud Console
# 2. Download OAuth2 credentials JSON
# 3. Copy to src/main/resources/credentials/gmail-credentials.json

export OPENAI_API_KEY=sk-...
./gradlew run -Pdry   # dry run — no emails sent, all decisions printed
./gradlew run         # live run

What the Capstones Prove Together

Read individually, each capstone demonstrates a working AI application. Read together, they demonstrate something more specific: that a framework can hold its composability guarantees across meaningfully different domains, deployment models, and operational concerns.

Capstone 1 proved the pipeline is real. Capstone 2 proved it holds under regulatory pressure. Capstone 3 proved it transfers to new domains without modification. Capstone 4 proved that when the framework had gaps — multimodal, structured output, cost management — the gaps were closed in the framework, not worked around in the application.

That last point is the one that matters most. A framework that works for the easy cases but forces workarounds for the hard ones isn't a framework — it's a starting point. The story of these four capstones is the story of CafeAI earning the right to be the gravity that everything else orbits.


CafeAI: Not an invention of anything new. A re-orientation of everything proven.

About

demo projects for cafeai framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors