Add Personal Knowledge Base example#16
Conversation
Local URL/PDF indexing with BGE-large embeddings and sqlite-vec ANN search. Two Python user agents (kb-ingest-agent, kb-query-agent); deps declared in pyproject.toml so uv provisions them on first spawn. Only the final answer synthesis hits Anthropic; embedding + retrieval stay on the host.
CodeQL flagged the </script> / </style> patterns as missing variants like </script > with internal whitespace. Allow optional whitespace around the tag name in the closing match. Bump kb-ingest-agent hash in the lockfile to match.
The previous fix only allowed whitespace inside the closing </script>/</style> tag; CodeQL still flagged variants like </script\t\n bar> where the closing tag carries (ignored) attributes. Accept any non-'>' chars after the tag name, matching the pattern used on the opening tag. Add \b after the opening name so <scriptlike> isn't matched. Bump kb-ingest-agent hash in the lockfile.
|
Gave this a full spin: imported the workspace and ran the whole loop (ingest URL → query → re-ingest) on a stock macOS (Apple Silicon) setup. Good news first — it works, and the output is genuinely nice. Ingested a Wikipedia article into 39 chunks, query came back with a well-structured, The real snag was a SQLite bug. Every run failed instantly with:
…leaving a 0-byte One other thing worth setting expectations around (not a defect): the first run pulls the ~1.3 GB BGE model, so on a slower/flaky connection that download gets painful — we were on a bad connection and it stalled a few times before completing. ( A few things would save the next person an afternoon:
Really like the example overall — just want the on-ramp to match the quality of the thing itself. |
Addresses PR review (LissaGreense): on a stock macOS / python.org 3.12
build, sqlite3 is compiled without loadable-extension support, so
conn.enable_load_extension(True) raised an opaque AttributeError and
left a 0-byte kb.db. requires-python only gates the version, not the
build flag, so it slipped through silently.
Both agents now probe an in-memory connection up front and raise a clear,
actionable error ("run `uv python install 3.12`") before touching the DB.
README documents the fix plus the ~1.3 GB model download and the
HF_HUB_DISABLE_XET=1 tip for flaky connections. Lockfile hashes bumped.
Summary
BAAI/bge-large-en-v1.5embeddings +sqlite-vecANN search, answers questions with cited sourceskb-ingest-agent,kb-query-agent) sharing a SQLite vector DB at~/.friday/local/workspaces/personal-knowledge-base/kb.dbpyproject.tomlper agent, souv run --directory <agent_dir>installssentence-transformers,sqlite-vec, andpymupdfon first spawn — no manual venv stepValidation
scripts/validate_examples.tsagainstfriday-studio@0.1.8:OK — 13 examples validated.yamllint --strict -c .yamllint.ymlon the new workspace: cleanmarkdownlint-cli2on the new README: 0 errorsworkspace.lockhashes verified end-to-end viareadLockfile+hashPrimitivefrompackages/bundle/src/Test plan
uvprovisions Python 3.12 + deps + downloads the ~1.3 GB BGE model into~/.cache/huggingface//ingest-urlround-trip with a Wikipedia article/ingest-pdfround-trip with an uploaded PDF (verifies the 4-strategy upload locator)/query-kbreturns a synthesized answer with[1]-style citations andsources_consultedalready_ingestedinstead of re-embedding