Skip to content

KeigoShimadaCC/agentic-knowledge-management

Repository files navigation

KnowledgeOS

A local-first personal AI knowledge operating system.
Dump everything, structure it, search it semantically, traverse it as a graph, and let AI agents operate it — all entirely on your Mac, with no cloud dependency and no data leaving your machine.


Philosophy

Most knowledge tools are either too simple (note apps that can't reason) or too complex (enterprise wikis that require a team to maintain). KnowledgeOS is built on a different premise:

Your knowledge should be a living system, not a filing cabinet.

The design decisions all follow from this:

  • Local-first, always. Every byte lives under ~/KnowledgeOS/. No sync service, no cloud account required. You own your data unconditionally.
  • Agent-ready from the start. Claude, ChatGPT, Codex, and local LLMs can search, read, write, and link knowledge through a built-in MCP server — not as a bolt-on feature, but as a first-class access mode.
  • Structured but flexible. Everything is a typed object (KosObject) with a common base. Pages, sources, assets, claims, projects, and chats are all objects you can search, link, and reason over in the same interface.
  • Soft everything. No hard deletes. No permanent overwrites. Every agent write is logged with a before/after diff. You can always undo, audit, or restore.
  • Durable indexes, not canonical truth. Qdrant (vector) and Kùzu (graph) are re-buildable from Postgres + filesystem. Postgres is the operational source of truth. Indexes are caches.

The full product vision is in project-phases/IDEA-DRAFT.md.


Current Status

Phases 1–4, Phase 5 AI Assistant, Phase 6A/6B Chat Import, Phase 7A MCP Read/Search, Phase 7B MCP Write Tools, and Phase 8A Workspace Lite are complete. Phase 8 (multi-pane workspaces), Phase 9 (career memory), and the multilingual search hardening track are in progress or planned.

See PROGRESS.md for the canonical progress tracker.

Phase Status
Phase 1 — Foundation ✅ Complete
Phase 2 — Sources & Rich Media ✅ Complete
Phase 3 — Search ✅ Complete
Phase 4 — Graph Lite ✅ Complete
Phase 5 — AI Assistant + Inbox/Triage ✅ Complete
Phase 6A — Chat Import Lite ✅ Complete
Phase 6B — Structured Chat Import ✅ Complete
Phase 7A — MCP Read/Search ✅ Complete
Phase 7B — MCP Write Tools ✅ Complete
Phase 8A — Workspace Lite (split pane) ✅ Complete
Hardening — Search Quality / Multilingual 🚧 Partial (ILIKE fallback only)
Phase 8 — Multi-Pane Workspaces ⬜ Planned
Phase 9 — Career & Project Memory ⬜ Planned
PHASE-PHONE-00 — Mobile Architecture & Contract ✅ Complete (docs only)
PHASE-PHONE-01A/B/C — Mobile Auth + Networking + iOS Scaffold ⬜ Planned
PHASE-PHONE-02A/B → 06 — iOS client, capture, AI, edit-lite, offline, device install ⬜ Planned

Test counts (as of 2026-05-15):

  • Frontend unit/component tests: 27 in apps/web/src/**/__tests__/
  • API integration + backend unit tests: 147 across tests/api/ and tests/unit/ (131 api + 16 unit)
  • Worker extractor tests: 19 in tests/worker/
  • MCP package tests: 35 in services/mcp/tests/
  • Playwright E2E tests: 10 in tests/e2e/specs/
  • Total: 238 tests across all local suites

See each phase's plan in project-phases/ for the full subtask spec.


Architecture

Browser (Next.js 14 + React + Tiptap)
  └─► FastAPI  /api/v1
        ├─► Postgres 16          ← operational source of truth
        │     objects, pages, edges, chunks, agent_runs, ingestion_jobs
        ├─► Redis                ← RQ job queue + cache
        ├─► ~/KnowledgeOS/library/  ← binary assets (SHA-256 content-addressed)
        ├─► Qdrant               ← vector index (re-buildable)
        └─► Kùzu                 ← graph index (re-buildable)

Python Workers (RQ)
  └─► ingestion → text extraction → chunking → embedding → graph sync → AI extraction

MCP Server (:8765)
  └─► read / search / write / ingest tools
        ↑
  Claude / ChatGPT / Codex / Cursor / local agents

Canonical source of truth: Postgres (structured data) + local filesystem (binary assets).
Qdrant and Kùzu are indexes. Treat them as caches; they can be rebuilt from Postgres at any time.
All host ports bind to 127.0.0.1 only. LAN access is opt-in.


Stack

Layer Technology
Frontend Next.js 14 + React 18 + Tailwind CSS
Editor Tiptap / ProseMirror (stores content as JSON)
Backend API FastAPI + SQLAlchemy 2.0 async
Background workers Python + Redis + RQ
MCP server Python MCP server over stdio (Phase 7A)
Operational DB Postgres 16
Vector DB Qdrant (local Docker)
Graph DB Kùzu (embedded in worker process)
Queue / cache Redis 7
Asset storage Local filesystem, content-addressed by SHA-256
AI OpenAI API (Phase 5+); Ollama local LLMs (later)
Deployment Docker Compose on Mac
iPhone client (planned) SwiftUI native iOS + XcodeGen (apps/ios/, Phase PHONE-01C onward)

Quick Start

Prerequisites

  • macOS, Docker Desktop, Node.js 20+, pnpm 10+, Python 3.12+, uv

Setup

# 1. Clone and enter the repo
git clone <repo-url> && cd agentic-knowledge-management

# 2. Create local directories + copy .env
bash scripts/setup.sh

# 3. Edit infra/.env as needed — all settings have working defaults
#    (SESSION_SECRET is optional; see docs/SECURITY.md for the session model)

# 4. Start all services
docker compose -f infra/docker-compose.yml up -d

# 5. Open the app
open http://localhost:3000

Register an account on first visit. All data stays local.

Simpler, non-engineer walkthrough: quickstart.md.

Docker troubleshooting

Symptom What to do
bind: address already in use on 8001 Free 127.0.0.1:8001 (often a local uvicorn on that port). Stop it, then docker compose -f infra/docker-compose.yml up -d again.
Mount error / path contains YOUR_USERNAME Set LIBRARY_ROOT in infra/.env to your real library path (under ~/KnowledgeOS/). Run bash scripts/setup.sh (it patches a stale placeholder). If a bad volume already exists: docker compose -f infra/docker-compose.yml down, docker volume rm infra_library-data, then up -d.
Internal Server Error from the web UI / Next Module not found Stale node_modules volume for kos-web: docker compose -f infra/docker-compose.yml rm -sf web, docker volume rm infra_kos-web-node-modules, docker compose -f infra/docker-compose.yml up -d --build web. If you use pnpm dev on the host with the API in Docker, add apps/web/.env.local from apps/web/.env.local.example. Confirm docker ps shows kos-web on port 3000.

Development

Frontend (apps/web)

pnpm dev          # Next.js dev server on :3000 with hot reload
pnpm typecheck    # tsc --noEmit (strict mode)
pnpm lint         # ESLint via next lint
pnpm build        # production build

Backend API (services/api)

cd services/api
uv run uvicorn app.main:app --reload --port 8000   # dev server
uv run ruff check .                                 # lint
uv run ruff format .                                # format
uv run python -c "from app.models import *; print('OK')"  # sanity check

Integration Tests

Tests run against a live knowledgeos_test Postgres database (auto-created and torn down per test). Requires Postgres running (via Docker or local).

cd tests
uv run pytest api/ unit/ -v                    # API integration + backend unit tests
uv run pytest worker/ -v                       # worker extractor tests
uv run pytest api/test_pages.py -v             # single file
uv run pytest api/test_pages.py::test_create_page  # single test

# MCP package tests (separate project)
cd ../services/mcp && uv run pytest tests/ -v   # MCP tool + config tests

End-to-end Tests

Playwright E2E tests live in tests/e2e and run against the local Compose stack. Use a sandbox library root so tests never write to the real ~/KnowledgeOS/library:

LIBRARY_ROOT=$PWD/tests/e2e/.tmp/library docker compose -f infra/docker-compose.yml up -d
pnpm test:e2e
pnpm --dir tests/e2e report

Testing Matrix

Layer Command
Frontend unit/component make test-unit or pnpm test:web
API + unit pytest make test-api
Worker extractors make test-worker
MCP package make test-mcp
Playwright E2E make test-e2e or pnpm test:e2e
Everything make test-all or pnpm test:all

make test-e2e expects the Compose stack to be running with a sandbox library root:

LIBRARY_ROOT=$PWD/tests/e2e/.tmp/library docker compose -f infra/docker-compose.yml up -d --build
make test-e2e

Alembic Migrations

cd services/api
uv run alembic revision --autogenerate -m "description"
uv run alembic upgrade head

Infrastructure

docker compose -f infra/docker-compose.yml up -d      # start (background)
docker compose -f infra/docker-compose.yml up         # start (foreground logs)
docker compose -f infra/docker-compose.yml down       # stop
docker compose -f infra/docker-compose.yml down -v    # stop + wipe volumes (destructive)

Compose defaults include bind-mounted sources, Alembic before uvicorn, and optional demo seed. With seeding enabled, sign in as demo@example.com / demo-demo-demo unless overridden in infra/.env.

Local Ports

Docker Compose publishes services on loopback-only host ports:

Service Host In-container
Web 127.0.0.1:3000 3000
API 127.0.0.1:8001 8000
Postgres 127.0.0.1:5433 5432
Redis 127.0.0.1:6379 6379
Qdrant HTTP 127.0.0.1:6333 6333
Qdrant gRPC 127.0.0.1:6334 6334

Use http://127.0.0.1:8001 for host-side clients talking to the dockerized API. Use http://api:8000 only from inside the Docker network. If you run the API natively with uvicorn --port 8000, host-side clients should use http://127.0.0.1:8000.

Backup & Restore

Create a local backup with:

bash scripts/backup.sh

Backups are written to ~/KnowledgeOS/backups/<timestamp>/ and include a custom-format pg_dump, a compressed library tarball, and a best-effort Qdrant snapshot metadata file. Postgres plus ~/KnowledgeOS/library/ are canonical; Qdrant is a rebuildable search index.

To restore manually, stop the app, restore postgres.dump into a clean Postgres database with pg_restore, unpack library.tar.gz back under ~/KnowledgeOS/, then rebuild/reindex derived search data as needed. Do not rely on Qdrant snapshots as the only backup of user data.


File Layout

.
├── apps/web/              Next.js frontend
│   └── src/
│       ├── app/           App Router: (auth)/ and (app)/ route groups
│       ├── components/    UI components (editor, assets, auth, layout)
│       ├── lib/           API client, SWR hooks
│       └── types/         Shared TypeScript types
│
├── apps/ios/              SwiftUI iPhone client (planned — Phase PHONE-01C scaffolds it)
│   ├── project.yml        XcodeGen source of truth
│   └── KnowledgeOS/       App / Core / Features / Resources
│
├── services/api/          FastAPI backend
│   └── app/
│       ├── api/v1/        Route handlers (auth, objects, pages, assets, health)
│       ├── core/          Deps, security, library, storage utilities
│       ├── db/            Session, base model
│       ├── models/        SQLAlchemy ORM models
│       ├── schemas/        Pydantic request/response schemas
│       └── services/      Business logic layer
│
├── services/worker/       RQ background worker (Phase 2+) — run as `rq worker kos-ingest`
├── services/mcp/          MCP server, stdio transport (Phase 7A)
│
├── packages/
│   ├── shared-types/      Reserved (currently empty placeholder)
│   ├── schemas/           Reserved (currently empty placeholder)
│   └── prompts/           Reserved (currently empty placeholder)
│
├── infra/                 Docker Compose, Dockerfiles, .env.example
├── scripts/               setup.sh, run_tests.sh, backup.sh
├── tests/api/             Integration tests (pytest + httpx ASGI)
├── tests/unit/            Pure unit tests (parsers, URL safety)
├── docs/                  Architecture, data model, API, MCP, ingestion, security
└── project-phases/        Phase-by-phase implementation plans

Data Model

The KosObject pattern

Every entity in the system — pages, assets, notes, bookmarks, collections — is a row in the objects table with a kind discriminator. Specialized tables (pages, assets, etc.) extend it by foreign key.

objects (id, user_id, kind, title, tags[], metadata{}, is_pinned, is_archived, deleted_at)
  ├── pages     (content_json, content_text, word_count, version)
  ├── assets    (sha256, storage_path, content_type, size_bytes, status)
  └── edges     (source_id, target_id, kind: link/child/citation/related/...)

Future object types (sources, claims, projects, chats, concepts, tasks, workspaces) extend the same base. See docs/DATA_MODEL.md for the full schema.

Asset storage

Binary files are stored content-addressed under LIBRARY_ROOT:

~/KnowledgeOS/library/assets/<sha256[:2]>/<sha256>/original.<ext>

Write-once semantics: if the path already exists, the upload is a no-op (deduplication by hash).

Soft deletes

All user-owned records use deleted_at (nullable timestamp). Hard deletes are never performed. Trash is filtered out by deleted_at IS NULL in all standard queries.


Product Modules (Full Vision)

The complete product is built across 9 phases. Phases 1–6B, Phase 7A, and Phase 8A are done; remaining phases are planned.

Module Description Phase Status
Wiki pages Rich Tiptap editor, auto-save, word count, typed links 1
Asset library Upload, dedup, preview, gallery, full-screen modal 1
Rich media sources PDF text+thumbnail, YouTube oEmbed+transcript, web article, CSV preview 2
Keyword + semantic search Postgres FTS, Qdrant vector, hybrid reranking, Cmd+K modal 3
Graph Lite Typed edges, backlinks, related, ObjectPicker, LinkToModal, GraphPanel 4
AI assistant + Inbox/Triage Summarize, extract claims/tasks, suggest links, KB Q&A, triage inbox 5
Chat import (raw) ChatGPT/Claude/Markdown/plain text import → searchable chats 6A
Chat structured import AI summary, decisions, claims/tasks extracted with turn refs 6B
MCP read/search server stdio MCP with search_objects, hybrid_search, get_*, get_related_objects 7A
Workspace Lite (split pane) Open any object in a side pane from search/backlinks/related 8A
MCP write tools create_page, update_page, create_edge, archive_object, ingest_url, ingest_file 7B
Multi-pane workspaces Persistent layouts, drag-across-pane, workspace-scoped AI 8
Career/project memory Project schema, evidence-linked resume bullets, STAR stories 9
iPhone client — contract Mobile spec docs: MOBILE_APP, MOBILE_API_CONTRACT, MOBILE_NETWORKING PHONE-00
iPhone client — MVP Bearer auth, SwiftUI scaffold, read/search/capture/AI on iOS Simulator + device PHONE-01A/B/C → 03C ⬜ Planned

AI Features

  • Career memory: project records, AI-generated resume bullets & STAR stories — see AGENT_GUIDE.md.

Roadmap

Phase Goal Status
1 — Foundation Docker, Postgres, auth, page CRUD, Tiptap editor, asset upload Done
2 — Sources & Rich Media PDF/YouTube/web/CSV ingestion, RQ worker, citation edges Done
3 — Search Chunking, Postgres FTS, Qdrant vectors, hybrid search, Cmd+K UI Done
4 — Graph Lite Typed edge API, backlinks, related objects from Postgres edges Done
5 — AI Assistant + Inbox/Triage AI sidebar, summarize/extract/suggest, KB Q&A, triage inbox Done
6A — Chat Import Lite Raw upload/paste of ChatGPT/Claude/Markdown/text exports Done
6B — Structured Chat Import AI summaries, extracted claims/tasks, turn-grounded graph links Done
7A — MCP Read/Search stdio MCP server, read-only tools, internal-token auth Done
8A — Workspace Lite Frontend split pane (no schema change) Done
Hardening — Search Quality Multilingual ILIKE fallback ✅; snippet sanitization, JP fixtures, debug UI, index-status endpoint pending Partial
7B — MCP Write Tools create_page, update_page, create_edge, archive_object, ingest_url, ingest_file Done
8 — Multi-Pane Workspaces Persistent layout engine, saved workspaces, workspace-scoped AI Planned
9 — Career Memory Project schema UI, resume bullet generator, STAR story generator Planned
PHONE-00 — Mobile Architecture & Contract MOBILE_APP, MOBILE_API_CONTRACT, MOBILE_NETWORKING docs Done
PHONE-01A — Mobile Backend Auth & API Bearer tokens, /auth/mobile-login, /mobile/bootstrap, fix get_current_user stub Planned
PHONE-01B — Mac ↔ iPhone Networking infra/docker-compose.mobile.yml, scripts/mobile_network_check.sh, ATS strategy Planned
PHONE-01C — iOS App Scaffold SwiftUI app + XcodeGen project.yml, Connect screen, health check Planned
PHONE-02A/B — API Client & Simulator QA Typed API client, Keychain auth store, simulator MCP smoke tests Planned
PHONE-03A/B/C — Read & Search / Capture / AI MVP Hybrid search, page/source/chat/project readers, capture, KB Q&A on iPhone Planned
PHONE-04 — Edit-Lite Title/tags/plain-body edits with Tiptap JSON safety Planned
PHONE-05 — Offline Cache & Queue Recent-object cache + outgoing note queue Planned
PHONE-06 — Device Install & Private Release Physical iPhone install via Xcode, TestFlight checklist Planned

Each phase has a detailed spec in project-phases/.


MCP & Agent Access

Phases 7A + 7B are shipped. KnowledgeOS exposes a local stdio MCP server in services/mcp/ that external agents (Claude Desktop, Claude Code, Cursor, Codex) can spawn as a subprocess. The server talks to FastAPI over 127.0.0.1 using a shared MCP_INTERNAL_TOKEN.

Enable it by setting MCP_ENABLED=true and a random MCP_INTERNAL_TOKEN in infra/.env, then point your agent client at uv run --project services/mcp kos-mcp. For the dockerized API, keep MCP_API_BASE_URL=http://127.0.0.1:8001; for a native API run on port 8000, override it to http://127.0.0.1:8000.

To enable write tools, also set MCP_ALLOW_WRITE_TOOLS=true.

Read/search tools (Phase 7A — live): search_objects, hybrid_search, get_object, get_page, get_source, get_related_objects, answer_from_kb

Write tools (Phase 7B — live, requires MCP_ALLOW_WRITE_TOOLS=true):

  • create_page — create a new page
  • update_page — update title/content/tags (supports expected_version for optimistic locking)
  • create_edge — link two objects with a typed relationship
  • archive_object — soft-archive (reversible via restore endpoint)
  • ingest_url — ingest a URL as a new source (web/youtube)
  • ingest_file — ingest a local file from LIBRARY_ROOT

Safety invariants (always enforced):

  • Every write tool call creates an agent_runs audit row
  • Mutating tools (update_page, archive_object) create object_revisions rows with before/after snapshots
  • Rate-limited per agent identity: 60 writes/minute, 600 writes/hour
  • Soft-delete only — archive_object is reversible, no hard deletes through MCP
  • No arbitrary shell execution through any MCP tool
  • No file access outside ~/KnowledgeOS/
  • API keys and session secrets are never returned in tool outputs

See docs/MCP_TOOLS.md for the full tool spec and docs/SECURITY.md for the auth and audit design.


iPhone App (planned)

A native SwiftUI iPhone client lives alongside apps/web/ at apps/ios/ (scaffold lands in Phase PHONE-01C). It is a thin client — reads, searches, captures, and asks grounded questions through the same /api/v1 surface the browser uses. No second backend; no direct DB or filesystem access from the phone.

Phase PHONE-00 (docs-only) is complete. The contract is fixed; implementation phases are planned:

  • docs/MOBILE_APP.md — product spec, MVP scope, screen map, endpoints intentionally NOT exposed on mobile
  • docs/MOBILE_API_CONTRACT.md — bearer auth contract (proposed for Phase 01A), verified read-side endpoint table, error envelope, pagination
  • docs/MOBILE_NETWORKING.md — Simulator / LAN / Tailscale profiles, infra/docker-compose.mobile.yml design, ATS strategy

Implementation phases live under project-phases/PHASE-PHONE-*.md (PHONE-01A backend auth → PHONE-06 device install). The big picture is in project-phases/IDEA-iPHONE-APP.md.

How to open it in Xcode (once Phase PHONE-01C lands):

cd apps/ios
xcodegen generate                 # produces KnowledgeOS.xcodeproj
open KnowledgeOS.xcodeproj

TestFlight distribution is Phase PHONE-06.


Environment Variables

Copy infra/.env.example to infra/.env before starting Docker.

Variable Default Required
SESSION_SECRET (not set) No — reserved for future signed URLs
POSTGRES_DB knowledgeos No
POSTGRES_USER kos No
POSTGRES_PASSWORD kospass No
LIBRARY_ROOT ~/KnowledgeOS/library No
OPENAI_API_KEY (not set) Phase 5+

Keyboard Shortcuts

Press ? anywhere in the app (outside a text field) to open the keyboard shortcut overlay.

Key Action
⌘K Open search
⌘\ Collapse / expand sidebar
? Show shortcuts
j / ↓ Move highlight down (lists)
k / ↑ Move highlight up (lists)
Enter Open highlighted item (lists)
Backspace Soft-delete highlighted item (lists)

See docs/SHORTCUTS.md for the full reference.


For AI Agents

If you are a coding agent (Claude, Codex, Cursor) working in this repo:

  • Read CLAUDE.md first — it is the operating contract for agents.
  • Read docs/AGENT_GUIDE.md for data model rules and agent write constraints.
  • Postgres is the source of truth. Do not mutate DB directly — go through the API or typed service functions.
  • Every write must create an agent_runs audit row.
  • Qdrant and Kùzu are indexes. They are re-buildable; never treat them as canonical.
  • Soft-delete only. deleted_at is the pattern. Never call DELETE on user data.

Documentation

File Contents
docs/ARCHITECTURE.md System diagram, service boundaries, sequence flows
docs/DATA_MODEL.md Full Postgres schema, object types, edge types
docs/API.md REST API reference
docs/MCP_TOOLS.md MCP tool spec, resources, prompts
docs/INGESTION.md Ingestion pipeline stages, job types, media handling
docs/SECURITY.md Auth, sessions, audit log, MCP safety model
docs/AGENT_GUIDE.md Rules for AI agents writing to this system
project-phases/IDEA-DRAFT.md Original full product spec
project-phases/PHASE-1-FOUNDATION.md Phase 1 subtask spec
docs/MOBILE_APP.md iPhone app product spec — MVP scope, screen map, endpoint exclusion list
docs/MOBILE_API_CONTRACT.md iPhone bearer-auth contract + verified read-side endpoint table
docs/MOBILE_NETWORKING.md Simulator / LAN / Tailscale profiles + ATS strategy
project-phases/IDEA-iPHONE-APP.md Canonical iPhone-track concept and phase plan

About

KnowledgeOS is a local-first personal AI knowledge management system that runs on your Mac, combining structured notes, semantic search, graph relationships, and MCP-based agent tools while keeping all data on-device.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors