diff --git a/docs/features/john-gpt/ChatPersistence.md b/docs/features/john-gpt/ChatPersistence.md new file mode 100644 index 0000000..3eadc74 --- /dev/null +++ b/docs/features/john-gpt/ChatPersistence.md @@ -0,0 +1,91 @@ +# JohnGPT Chat Persistence Architecture + +## Overview + +The chat persistence in JohnGPT follows a **Client-First, Optimistic** approach. Unlike traditional chat apps where the backend saves the message *before* responding, JohnGPT prioritizes responsiveness by saving locally first and syncing to the server in the background. + +## The flow + +1. **User Types Message**: + - The client (`useBranchingChat`) sends the message to the inference endpoint (`/api/chat`). + - The UI updates immediately with the user's message. + +2. **Streaming Response (Read-Only)**: + - The `/api/chat` endpoint **DOES NOT** save the message to the database. + - It is a purely functional "inference engine" that processes the input, routes it to the correct AI model, and streams the text response back. + - This ensures maximum speed and prevents database write-locks from blocking the stream. + +3. **Client-Side State Update**: + - As the AI response streams in, the `useBranchingChat` hook updates its internal `messages` state. + - A specialized `useEffect` hook monitors these changes. + +4. **Optimistic Local Save (IndexedDB)**: + - When the message stream completes (status changes from `streaming` to `ready`), the `dbSyncManager` is triggered. + - It **immediately** saves the full conversation tree to the browser's **IndexedDB**. + - This ensures that if the user refreshes the page instantly, the chat is not lost, even if the server sync hasn't happened yet. + +5. **Background Server Sync**: + - The `dbSyncManager` debounces the server sync (waits for 5 seconds of inactivity or completion). + - It sends a `PATCH` request to `/api/conversations/[id]` with the full conversation data. + - If the user is offline, the sync is queued and retried automatically when the connection is restored. + +## Component Responsibility + +### 1. `useBranchingChat.ts` (The Orchestrator) +- Manages the active chat state. +- Detects partial streams vs. completed messages. +- Calls `dbSyncManager.saveConversation()` only when safe (not during active streaming). +- Triggers AI Title Generation after 3 exchanges. + +### 2. `DBSyncManager.ts` (The Variable Layer) +- **IndexedDB**: The primary "cache" that the user sees. +- **Debounce Logic**: Prevents spamming the API with a save request for every single token generated. +- **Sync Logic**: + - `isAuthenticated && !isWidget` -> Syncs to Postgres (`/api/conversations`). + - `Offline` -> Queues for later. + - `Guest` -> Local storage only. + +### 3. `/api/chat/route.ts` (The Brain) +- **Stateless**. +- Performs Tier checks and Rate Limits. +- Streams the text. +- **Does NOT write to `Conversation` table.** + +### 4. `/api/conversations/[id]/route.ts` (The Storage) +- Receives the `PATCH` request from the client. +- Updates the `messages` JSON column in Postgres. +- Handles `404` by creating a new record if it doesn't exist (Lazy Creation). + +## Diagram + +```mermaid +sequenceDiagram + participant User + participant ClientHook as useBranchingChat + participant DBManager as DBSyncManager + participant IDB as IndexedDB + participant ChatAPI as /api/chat + participant StorageAPI as /api/conversations + + User->>ClientHook: Sends Message + ClientHook->>ChatAPI: POST (Stream Request) + ChatAPI-->>ClientHook: Streaming Response... + + note over ClientHook,ChatAPI: No DB writes yet + + ClientHook->>ClientHook: Stream Complete + ClientHook->>DBManager: saveConversation() + DBManager->>IDB: Write (Immediate) + + rect rgb(240, 240, 240) + note right of DBManager: Debounce 5s + DBManager->>StorageAPI: PATCH (Sync to Postgres) + StorageAPI-->>DBManager: 200 OK + end +``` + +## Why this architecture? + +1. **Speed**: Typing and seeing the AI response feels instant because we don't wait for a DB `INSERT`. +2. **Reliability**: Chat works offline. You can read past chats and even "send" messages (which queue up) without internet. +3. **Cost**: We don't hammer the database with valid/invalid writes for every partial token. We only write the "final" state. diff --git a/docs/features/john-gpt/ToolCallSystem.md b/docs/features/john-gpt/ToolCallSystem.md new file mode 100644 index 0000000..4fa0fb4 --- /dev/null +++ b/docs/features/john-gpt/ToolCallSystem.md @@ -0,0 +1,270 @@ +# Tool Call System: Achieving Near-Perfect Tool Accuracy + +## Overview + +The JohnGPT tool call system achieves near-perfect accuracy through a **three-pillar architecture**: +1. **Structured System Prompts** with explicit tool guidelines +2. **Strongly-Typed Tool Definitions** using Zod schemas +3. **Vector-Powered Execution** for semantic understanding + +This document explains the architectural decisions that make tool calls reliable and consistent. + +--- + +## Architecture Diagram + +```mermaid +flowchart TD + subgraph Client["Client (Browser)"] + A[User Input] --> B[useChat Hook] + B -->|UIMessage[]| C[POST /api/chat] + end + + subgraph Server["API Route"] + C --> D[PromptManager.getSystemPrompt] + D --> E[streamText with Tools] + E -->|Tool Invocation| F{Which Tool?} + F -->|searchKnowledge| G[RAG Utils] + F -->|goTo| H[findDestination] + G -->|Vector Search| I[(PostgreSQL + pgvector)] + H -->|Vector Search| I + end + + subgraph Response["Response Flow"] + E -->|Stream| J[toUIMessageStreamResponse] + J -->|Action Payload| K[Client Handles Action] + end +``` + +--- + +## Pillar 1: Structured System Prompts + +The key to reliable tool invocation is **teaching the model WHEN to use tools vs when to rely on its own knowledge.** + +### The Problem + +Without guidance, LLMs will over-call tools for every question, leading to: +- Slow responses (unnecessary DB queries) +- Incorrect answers (searching for general knowledge) +- Poor UX (latency spikes) + +### The Solution: Contextual Tool Guidelines + +In [prompt-manager.ts](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/src/lib/ai/prompt-manager.ts): + +```typescript +private static getToolingRules(context: ChatContext): string { + return ` + + 1. SEARCH_KNOWLEDGE: + - **TRIGGER (STRICT):** Use ONLY for questions about **J StaR proprietary info** + (Pricing, specific services, John's personal bio, Portfolio items). + - **FORBIDDEN:** DO NOT search the database for: + * General opinions (e.g., "Is DaVinci good?") + * General definitions (e.g., "What is Next.js?") + * Jokes, Small Talk, or General Advice. + + 2. GOTO_TOOL (Unified Navigation): + - **TRIGGER:** User wants to change their view or "see" something. + - **USAGE:** + * User: "Go to services" -> goTo({ destination: "services" }) + * User: "Show me the pricing" -> goTo({ destination: "pricing" }) + - **RULE:** If the user asks "Where is X?", DO NOT explain where it is. + Just take them there using this tool. +`; +} +``` + +### Key Techniques + +| Technique | Description | Example | +|-----------|-------------|---------| +| **Explicit Triggers** | Define exactly when to use each tool | `TRIGGER (STRICT): Use ONLY for...` | +| **Forbidden Cases** | List what NOT to do | `DO NOT search the database for...` | +| **Usage Examples** | Show input → output mappings | `"Go to services" → goTo({...})` | +| **The "Google Test"** | If Google could answer it, don't search | General coding questions | + +--- + +## Pillar 2: Strongly-Typed Tool Definitions + +Using Vercel AI SDK's `tool()` helper with Zod schemas ensures: +1. **Type safety** at compile time +2. **Clear descriptions** for the LLM +3. **Structured outputs** for the client + +### Tool Definition Pattern + +From [route.ts](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/src/app/api/chat/route.ts): + +```typescript +import { tool } from 'ai'; +import { z } from 'zod'; + +tools: { + searchKnowledge: tool({ + description: 'Search the knowledge base for ANY information related to J StaR, + including services, portfolio, team members, testimonials, pricing, + or specific details found on the website.', + inputSchema: z.object({ + query: z.string().describe('What to search for in the knowledge base'), + }), + execute: async ({ query }) => { + const results = await searchKnowledgeBase(query, 5); + return formatSearchResults(results); + }, + }), + + goTo: tool({ + description: `Smart navigation tool. Handles BOTH page navigation AND section scrolling. +Use when user says "go to X", "show me X", "take me to X". +EXAMPLES: +- "show me services" → goTo({destination: "services"}) +- "take me to pricing" → goTo({destination: "pricing"}) +NEGATIVE: Do NOT use for general questions, greetings, or casual chat.`, + inputSchema: z.object({ + destination: z.string().describe('Where the user wants to go'), + }), + execute: async ({ destination }) => { /* ... */ }, + }), +} +``` + +### Why This Works + +1. **Rich Descriptions** - The `description` field is consumed by the model to decide WHEN to invoke the tool. Include both positive and negative examples. + +2. **Zod Schemas** - Validate inputs automatically. If the model produces malformed input, it fails fast. + +3. **Simple Parameters** - Single, clear parameters reduce model confusion. `destination: string` is easier to predict than complex nested objects. + +--- + +## Pillar 3: Vector-Powered Execution + +Both tools leverage **semantic understanding** via vector embeddings stored in PostgreSQL with pgvector. + +### searchKnowledge → RAG System + +From [rag-utils.ts](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/src/lib/ai/rag-utils.ts): + +```typescript +export async function searchKnowledgeBase(query: string, limit: number = 5) { + // Generate embedding for the query + const queryEmbedding = await generateQueryEmbedding(query); + + // Cosine similarity search via pgvector + const results = await prisma.$queryRaw` + SELECT + page_url, page_title, content_chunk, + 1 - (embedding <=> ${embeddingString}::vector) as similarity + FROM site_embeddings + WHERE 1 - (embedding <=> ${embeddingString}::vector) > 0.3 + ORDER BY embedding <=> ${embeddingString}::vector + LIMIT ${limit} + `; + return results; +} +``` + +### goTo → Smart Destination Finder + +From [findDestination.ts](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/src/lib/ai/findDestination.ts): + +The `goTo` tool uses vector search to match user intent to both **pages** and **sections**: + +```typescript +export async function findDestination(query: string, currentPath: string) { + const queryEmbedding = await generateQueryEmbedding(query); + + // Search BOTH pages and sections + const pageResults = await prisma.$queryRaw` + SELECT url, title, 1 - (embedding <=> ${embedding}::vector) as similarity + FROM page_navigation + WHERE 1 - (embedding <=> ${embedding}::vector) > 0.4 + `; + + const sectionResults = await prisma.$queryRaw` + SELECT element_id, title, page_url, + 1 - (embedding <=> ${embedding}::vector) as similarity + FROM page_sections + WHERE 1 - (embedding <=> ${embedding}::vector) > 0.4 + `; + + // Smart resolution logic... +} +``` + +### Smart Resolution Logic + +The system handles context-aware navigation: + +| Scenario | Action | +|----------|--------| +| User on `/` says "show me pricing" | Returns `scrollToSection` if a pricing section exists on the current page | +| User on `/about` says "go to services" | Returns `navigate` to `/services` | +| User says "show me the portfolio section" | Returns `navigateAndScroll` to page + section | + +--- + +## The "stopWhen" Safety + +To prevent infinite tool loops, we use: + +```typescript +const result = await streamText({ + model: selectedModel, + messages: modelMessages, + system: systemPrompt, + stopWhen: stepCountIs(5), // Allow AI to continue after tool execution for up to 5 steps + maxRetries: 2, + tools: { /* ... */ }, +}); +``` + +This allows the model to: +1. Call a tool +2. Process the result +3. Respond to the user OR call another tool +4. Repeat up to 5 steps total + +--- + +## Response Format + +Tool results are returned as **structured action payloads** that the client can interpret: + +```typescript +// goTo tool returns structured actions +return { + action: 'navigate' | 'scrollToSection' | 'navigateAndScroll' | 'showLoginComponent', + url?: string, + sectionId?: string, + title?: string, + message: string, // Human-readable confirmation +}; +``` + +The client-side hook then handles these actions: +- **navigate** → `router.push(url)` +- **scrollToSection** → `document.getElementById(sectionId)?.scrollIntoView()` +- **navigateAndScroll** → Navigate then scroll after page load + +--- + +## Summary: The 5 Keys to Perfect Tool Calls + +1. **Explicit Triggers** – Tell the model exactly when to use each tool +2. **Forbidden Cases** – Tell it when NOT to use tools +3. **Usage Examples** – Show input → output in the description +4. **Simple Schemas** – One clear parameter per tool +5. **Vector Intelligence** – Use embeddings for semantic matching, not keyword matching + +--- + +## Related Documentation + +- [RAG-KnowledgeBase.md](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/docs/features/john-gpt/RAG-KnowledgeBase.md) - How site content is embedded +- [UnifiedNavigation.md](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/docs/features/john-gpt/UnifiedNavigation.md) - Navigation system architecture +- [AdvancedNavigationSystem.md](file:///c:/CreativeOS/01_Projects/Code/jstar-platform/docs/features/AdvancedNavigationSystem.md) - Page/Section embedding system