Skip to content

Add Personal Knowledge Base example#16

Merged
odk- merged 4 commits into
mainfrom
add-personal-knowledge-base
May 28, 2026
Merged

Add Personal Knowledge Base example#16
odk- merged 4 commits into
mainfrom
add-personal-knowledge-base

Conversation

@odk-

@odk- odk- commented May 27, 2026

Copy link
Copy Markdown

Summary

  • New workspace example: feed it URLs and PDFs, it indexes them locally with BAAI/bge-large-en-v1.5 embeddings + sqlite-vec ANN search, answers questions with cited sources
  • Two Python user agents (kb-ingest-agent, kb-query-agent) sharing a SQLite vector DB at ~/.friday/local/workspaces/personal-knowledge-base/kb.db
  • Deps declared in pyproject.toml per agent, so uv run --directory <agent_dir> installs sentence-transformers, sqlite-vec, and pymupdf on first spawn — no manual venv step
  • Only the synthesis call (top-10 retrieved chunks) goes to Anthropic; embedding and retrieval stay on the host

Validation

  • scripts/validate_examples.ts against friday-studio@0.1.8: OK — 13 examples validated.
  • yamllint --strict -c .yamllint.yml on the new workspace: clean
  • markdownlint-cli2 on the new README: 0 errors
  • workspace.lock hashes verified end-to-end via readLockfile + hashPrimitive from packages/bundle/src/

Test plan

  • Import the workspace via Discover Spaces on a clean Friday install
  • First-run cost: confirm uv provisions Python 3.12 + deps + downloads the ~1.3 GB BGE model into ~/.cache/huggingface/
  • /ingest-url round-trip with a Wikipedia article
  • /ingest-pdf round-trip with an uploaded PDF (verifies the 4-strategy upload locator)
  • /query-kb returns a synthesized answer with [1]-style citations and sources_consulted
  • Re-ingest the same URL returns already_ingested instead of re-embedding

Local URL/PDF indexing with BGE-large embeddings and sqlite-vec ANN
search. Two Python user agents (kb-ingest-agent, kb-query-agent); deps
declared in pyproject.toml so uv provisions them on first spawn. Only
the final answer synthesis hits Anthropic; embedding + retrieval stay on
the host.
@odk- odk- requested a review from a team as a code owner May 27, 2026 16:14
Comment thread personal-knowledge-base/agents/kb-ingest-agent/agent.py Fixed
CodeQL flagged the </script> / </style> patterns as missing variants
like </script > with internal whitespace. Allow optional whitespace
around the tag name in the closing match. Bump kb-ingest-agent hash in
the lockfile to match.
Comment thread personal-knowledge-base/agents/kb-ingest-agent/agent.py Fixed
The previous fix only allowed whitespace inside the closing
</script>/</style> tag; CodeQL still flagged variants like
</script\t\n bar> where the closing tag carries (ignored) attributes.
Accept any non-'>' chars after the tag name, matching the pattern used
on the opening tag. Add \b after the opening name so <scriptlike>
isn't matched. Bump kb-ingest-agent hash in the lockfile.
@LissaGreense

Copy link
Copy Markdown
Contributor

Gave this a full spin: imported the workspace and ran the whole loop (ingest URL → query → re-ingest) on a stock macOS (Apple Silicon) setup. Good news first — it works, and the output is genuinely nice. Ingested a Wikipedia article into 39 chunks, query came back with a well-structured, [1][7]-cited answer from 10 retrieved chunks, and re-ingesting the same URL correctly short-circuited to already_ingested without re-embedding. The local-embeddings / cloud-only-for-synthesis design is great.

The real snag was a SQLite bug. Every run failed instantly with:

'sqlite3.Connection' object has no attribute 'enable_load_extension'

…leaving a 0-byte kb.db. The agents need conn.enable_load_extension(True) to load sqlite-vec, but the default macOS python3.12 (python.org build, which uv run --python 3.12 selects) is compiled without loadable-extension support. requires-python = ">=3.12" passes happily — it gates the version, not the build flag — so there's no warning, just a cryptic crash. Fix was uv python install 3.12 (the uv-managed cpython does enable extensions) + rebuilding the agent venvs against it. After that everything worked first try.

One other thing worth setting expectations around (not a defect): the first run pulls the ~1.3 GB BGE model, so on a slower/flaky connection that download gets painful — we were on a bad connection and it stalled a few times before completing. (HF_HUB_DISABLE_XET=1 helped a lot when it stalled.)

A few things would save the next person an afternoon:

  • Preflight guard: in init_db() (and the query agent), check hasattr(conn, "enable_load_extension") and raise a clear "needs a Python built with loadable SQLite extension support (for sqlite-vec)" instead of the opaque AttributeError.
  • Ship a setup/requirements skill with the workspace (rather than relying on a README): bundle a skills/ entry that the workspace chat agent can invoke to detect the broken interpreter and walk the user through — or just run — the fix (install a loadable-extension-capable Python, rebuild the agent venvs, warm the ~1.3 GB model). Skills ride along in the bundle, so the space could self-onboard on first failure instead of the user hunting through docs. A README note is fine as a fallback, but a skill turns "here's what's wrong" into "want me to fix it?".

Really like the example overall — just want the on-ramp to match the quality of the thing itself.

LissaGreense
LissaGreense previously approved these changes May 28, 2026
Addresses PR review (LissaGreense): on a stock macOS / python.org 3.12
build, sqlite3 is compiled without loadable-extension support, so
conn.enable_load_extension(True) raised an opaque AttributeError and
left a 0-byte kb.db. requires-python only gates the version, not the
build flag, so it slipped through silently.

Both agents now probe an in-memory connection up front and raise a clear,
actionable error ("run `uv python install 3.12`") before touching the DB.
README documents the fix plus the ~1.3 GB model download and the
HF_HUB_DISABLE_XET=1 tip for flaky connections. Lockfile hashes bumped.
@odk- odk- merged commit 43ad5f1 into main May 28, 2026
7 checks passed
@odk- odk- deleted the add-personal-knowledge-base branch May 28, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants