Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions docs/features/john-gpt/ChatPersistence.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# JohnGPT Chat Persistence Architecture

## Overview

The chat persistence in JohnGPT follows a **Client-First, Optimistic** approach. Unlike traditional chat apps where the backend saves the message *before* responding, JohnGPT prioritizes responsiveness by saving locally first and syncing to the server in the background.

## The flow

1. **User Types Message**:
- The client (`useBranchingChat`) sends the message to the inference endpoint (`/api/chat`).
- The UI updates immediately with the user's message.

2. **Streaming Response (Read-Only)**:
- The `/api/chat` endpoint **DOES NOT** save the message to the database.
- It is a purely functional "inference engine" that processes the input, routes it to the correct AI model, and streams the text response back.
- This ensures maximum speed and prevents database write-locks from blocking the stream.

3. **Client-Side State Update**:
- As the AI response streams in, the `useBranchingChat` hook updates its internal `messages` state.
- A specialized `useEffect` hook monitors these changes.

4. **Optimistic Local Save (IndexedDB)**:
- When the message stream completes (status changes from `streaming` to `ready`), the `dbSyncManager` is triggered.
- It **immediately** saves the full conversation tree to the browser's **IndexedDB**.
- This ensures that if the user refreshes the page instantly, the chat is not lost, even if the server sync hasn't happened yet.

5. **Background Server Sync**:
- The `dbSyncManager` debounces the server sync (waits for 5 seconds of inactivity or completion).
- It sends a `PATCH` request to `/api/conversations/[id]` with the full conversation data.
- If the user is offline, the sync is queued and retried automatically when the connection is restored.

## Component Responsibility

### 1. `useBranchingChat.ts` (The Orchestrator)
- Manages the active chat state.
- Detects partial streams vs. completed messages.
- Calls `dbSyncManager.saveConversation()` only when safe (not during active streaming).
- Triggers AI Title Generation after 3 exchanges.

### 2. `DBSyncManager.ts` (The Variable Layer)
- **IndexedDB**: The primary "cache" that the user sees.
- **Debounce Logic**: Prevents spamming the API with a save request for every single token generated.
- **Sync Logic**:
- `isAuthenticated && !isWidget` -> Syncs to Postgres (`/api/conversations`).
- `Offline` -> Queues for later.
- `Guest` -> Local storage only.

### 3. `/api/chat/route.ts` (The Brain)
- **Stateless**.
- Performs Tier checks and Rate Limits.
- Streams the text.
- **Does NOT write to `Conversation` table.**

### 4. `/api/conversations/[id]/route.ts` (The Storage)
- Receives the `PATCH` request from the client.
- Updates the `messages` JSON column in Postgres.
- Handles `404` by creating a new record if it doesn't exist (Lazy Creation).

## Diagram

```mermaid
sequenceDiagram
participant User
participant ClientHook as useBranchingChat
participant DBManager as DBSyncManager
participant IDB as IndexedDB
participant ChatAPI as /api/chat
participant StorageAPI as /api/conversations

User->>ClientHook: Sends Message
ClientHook->>ChatAPI: POST (Stream Request)
ChatAPI-->>ClientHook: Streaming Response...

note over ClientHook,ChatAPI: No DB writes yet

ClientHook->>ClientHook: Stream Complete
ClientHook->>DBManager: saveConversation()
DBManager->>IDB: Write (Immediate)

rect rgb(240, 240, 240)
note right of DBManager: Debounce 5s
DBManager->>StorageAPI: PATCH (Sync to Postgres)
StorageAPI-->>DBManager: 200 OK
end
```

## Why this architecture?

1. **Speed**: Typing and seeing the AI response feels instant because we don't wait for a DB `INSERT`.
2. **Reliability**: Chat works offline. You can read past chats and even "send" messages (which queue up) without internet.
3. **Cost**: We don't hammer the database with valid/invalid writes for every partial token. We only write the "final" state.
270 changes: 270 additions & 0 deletions docs/features/john-gpt/ToolCallSystem.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
# Tool Call System: Achieving Near-Perfect Tool Accuracy

## Overview

The JohnGPT tool call system achieves near-perfect accuracy through a **three-pillar architecture**:
1. **Structured System Prompts** with explicit tool guidelines
2. **Strongly-Typed Tool Definitions** using Zod schemas
3. **Vector-Powered Execution** for semantic understanding

This document explains the architectural decisions that make tool calls reliable and consistent.

---

## Architecture Diagram

```mermaid
flowchart TD
subgraph Client["Client (Browser)"]
A[User Input] --> B[useChat Hook]
B -->|UIMessage[]| C[POST /api/chat]
end

subgraph Server["API Route"]
C --> D[PromptManager.getSystemPrompt]
D --> E[streamText with Tools]
E -->|Tool Invocation| F{Which Tool?}
F -->|searchKnowledge| G[RAG Utils]
F -->|goTo| H[findDestination]
G -->|Vector Search| I[(PostgreSQL + pgvector)]
H -->|Vector Search| I
end

subgraph Response["Response Flow"]
E -->|Stream| J[toUIMessageStreamResponse]
J -->|Action Payload| K[Client Handles Action]
end
```

---

## Pillar 1: Structured System Prompts

The key to reliable tool invocation is **teaching the model WHEN to use tools vs when to rely on its own knowledge.**

### The Problem

Without guidance, LLMs will over-call tools for every question, leading to:
- Slow responses (unnecessary DB queries)
- Incorrect answers (searching for general knowledge)
- Poor UX (latency spikes)

### The Solution: Contextual Tool Guidelines

In [prompt-manager.ts](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/src/lib/ai/prompt-manager.ts):

```typescript
private static getToolingRules(context: ChatContext): string {
return `
<tool_guidelines>
1. SEARCH_KNOWLEDGE:
- **TRIGGER (STRICT):** Use ONLY for questions about **J StaR proprietary info**
(Pricing, specific services, John's personal bio, Portfolio items).
- **FORBIDDEN:** DO NOT search the database for:
* General opinions (e.g., "Is DaVinci good?")
* General definitions (e.g., "What is Next.js?")
* Jokes, Small Talk, or General Advice.

2. GOTO_TOOL (Unified Navigation):
- **TRIGGER:** User wants to change their view or "see" something.
- **USAGE:**
* User: "Go to services" -> goTo({ destination: "services" })
* User: "Show me the pricing" -> goTo({ destination: "pricing" })
- **RULE:** If the user asks "Where is X?", DO NOT explain where it is.
Just take them there using this tool.
</tool_guidelines>`;
}
```

### Key Techniques

| Technique | Description | Example |
|-----------|-------------|---------|
| **Explicit Triggers** | Define exactly when to use each tool | `TRIGGER (STRICT): Use ONLY for...` |
| **Forbidden Cases** | List what NOT to do | `DO NOT search the database for...` |
| **Usage Examples** | Show input → output mappings | `"Go to services" → goTo({...})` |
| **The "Google Test"** | If Google could answer it, don't search | General coding questions |

---

## Pillar 2: Strongly-Typed Tool Definitions

Using Vercel AI SDK's `tool()` helper with Zod schemas ensures:
1. **Type safety** at compile time
2. **Clear descriptions** for the LLM
3. **Structured outputs** for the client

### Tool Definition Pattern

From [route.ts](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/src/app/api/chat/route.ts):

```typescript
import { tool } from 'ai';
import { z } from 'zod';

tools: {
searchKnowledge: tool({
description: 'Search the knowledge base for ANY information related to J StaR,
including services, portfolio, team members, testimonials, pricing,
or specific details found on the website.',
inputSchema: z.object({
query: z.string().describe('What to search for in the knowledge base'),
}),
execute: async ({ query }) => {
const results = await searchKnowledgeBase(query, 5);
return formatSearchResults(results);
},
}),

goTo: tool({
description: `Smart navigation tool. Handles BOTH page navigation AND section scrolling.
Use when user says "go to X", "show me X", "take me to X".
EXAMPLES:
- "show me services" → goTo({destination: "services"})
- "take me to pricing" → goTo({destination: "pricing"})
NEGATIVE: Do NOT use for general questions, greetings, or casual chat.`,
inputSchema: z.object({
destination: z.string().describe('Where the user wants to go'),
}),
execute: async ({ destination }) => { /* ... */ },
}),
}
```

### Why This Works

1. **Rich Descriptions** - The `description` field is consumed by the model to decide WHEN to invoke the tool. Include both positive and negative examples.

2. **Zod Schemas** - Validate inputs automatically. If the model produces malformed input, it fails fast.

3. **Simple Parameters** - Single, clear parameters reduce model confusion. `destination: string` is easier to predict than complex nested objects.

---

## Pillar 3: Vector-Powered Execution

Both tools leverage **semantic understanding** via vector embeddings stored in PostgreSQL with pgvector.

### searchKnowledge → RAG System

From [rag-utils.ts](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/src/lib/ai/rag-utils.ts):

```typescript
export async function searchKnowledgeBase(query: string, limit: number = 5) {
// Generate embedding for the query
const queryEmbedding = await generateQueryEmbedding(query);

// Cosine similarity search via pgvector
const results = await prisma.$queryRaw`
SELECT
page_url, page_title, content_chunk,
1 - (embedding <=> ${embeddingString}::vector) as similarity
FROM site_embeddings
WHERE 1 - (embedding <=> ${embeddingString}::vector) > 0.3
ORDER BY embedding <=> ${embeddingString}::vector
LIMIT ${limit}
`;
return results;
}
```

### goTo → Smart Destination Finder

From [findDestination.ts](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/src/lib/ai/findDestination.ts):

The `goTo` tool uses vector search to match user intent to both **pages** and **sections**:

```typescript
export async function findDestination(query: string, currentPath: string) {
const queryEmbedding = await generateQueryEmbedding(query);

// Search BOTH pages and sections
const pageResults = await prisma.$queryRaw`
SELECT url, title, 1 - (embedding <=> ${embedding}::vector) as similarity
FROM page_navigation
WHERE 1 - (embedding <=> ${embedding}::vector) > 0.4
`;

const sectionResults = await prisma.$queryRaw`
SELECT element_id, title, page_url,
1 - (embedding <=> ${embedding}::vector) as similarity
FROM page_sections
WHERE 1 - (embedding <=> ${embedding}::vector) > 0.4
`;

// Smart resolution logic...
}
```

### Smart Resolution Logic

The system handles context-aware navigation:

| Scenario | Action |
|----------|--------|
| User on `/` says "show me pricing" | Returns `scrollToSection` if a pricing section exists on the current page |
| User on `/about` says "go to services" | Returns `navigate` to `/services` |
| User says "show me the portfolio section" | Returns `navigateAndScroll` to page + section |

---

## The "stopWhen" Safety

To prevent infinite tool loops, we use:

```typescript
const result = await streamText({
model: selectedModel,
messages: modelMessages,
system: systemPrompt,
stopWhen: stepCountIs(5), // Allow AI to continue after tool execution for up to 5 steps
maxRetries: 2,
tools: { /* ... */ },
});
```

This allows the model to:
1. Call a tool
2. Process the result
3. Respond to the user OR call another tool
4. Repeat up to 5 steps total

---

## Response Format

Tool results are returned as **structured action payloads** that the client can interpret:

```typescript
// goTo tool returns structured actions
return {
action: 'navigate' | 'scrollToSection' | 'navigateAndScroll' | 'showLoginComponent',
url?: string,
sectionId?: string,
title?: string,
message: string, // Human-readable confirmation
};
```

The client-side hook then handles these actions:
- **navigate** → `router.push(url)`
- **scrollToSection** → `document.getElementById(sectionId)?.scrollIntoView()`
- **navigateAndScroll** → Navigate then scroll after page load

---

## Summary: The 5 Keys to Perfect Tool Calls

1. **Explicit Triggers** – Tell the model exactly when to use each tool
2. **Forbidden Cases** – Tell it when NOT to use tools
3. **Usage Examples** – Show input → output in the description
4. **Simple Schemas** – One clear parameter per tool
5. **Vector Intelligence** – Use embeddings for semantic matching, not keyword matching

---

## Related Documentation

- [RAG-KnowledgeBase.md](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/docs/features/john-gpt/RAG-KnowledgeBase.md) - How site content is embedded
- [UnifiedNavigation.md](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/docs/features/john-gpt/UnifiedNavigation.md) - Navigation system architecture
- [AdvancedNavigationSystem.md](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/docs/features/AdvancedNavigationSystem.md) - Page/Section embedding system