Skip to content

Knowledge base feature#7

Merged
bulletinmybeard merged 8 commits into
masterfrom
knowledge-base-feature
Jun 21, 2026
Merged

Knowledge base feature#7
bulletinmybeard merged 8 commits into
masterfrom
knowledge-base-feature

Conversation

@bulletinmybeard

@bulletinmybeard bulletinmybeard commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a personal Knowledge Database to agentforge-api (/knowledge/*): a store
for user-created entries — snippets, commands, URLs, configs, error solutions,
notes, API examples — in its own Qdrant collection (knowledge_entries),
separate from the RAG index. Same embedding pipeline, dedicated CRUD + search.

What's included

API (/knowledge/*)

  • CRUD: create (single + batch up to 100), get, update, delete, bulk-delete by filter
  • Search: semantic /search and /search/smart with tag/type/project filters, plus tag faceting and stats
  • /filter: list entries by metadata filters (incl. parent_id) without a vector search
  • /entries/{id}/context: most relevant passages from one entry for a query, with adjacent pages for context
  • /entries/{id}/rechunk: rebuild page chunks for entries indexed before chunking existed
  • /extract: server-side text extraction from uploads (PDF via pdfplumber, pdftotext fallback; text/code/config as UTF-8), reusing AgentForge's extraction path instead of frontend JS

Behavior

  • Smart re-indexing on update — re-embeds only when title, content, or notes change; metadata-only edits skip the embedding call
  • Parent/child attachments via parent_id, with per-page chunking so a parent and its attached documents are searchable as passages
  • metadata free-form field on all points
  • SAQ batch job for bulk ingestion

Config

  • New knowledge block: collection_name, dedup_threshold, composite_template (env prefix KNOWLEDGE_)

Implementation notes

  • knowledge_service.py orchestrates embed -> dedup -> upsert; knowledge_vector_service.py owns the Qdrant collection (lazy client, payload indexes, page-chunk search).
  • Made embedding_service a lazy proxy so importing the service stack no longer builds the embedding client at import time — keeps test collection working without a config.yaml (gitignored, absent in CI). Construction defers to the first real .embed() call.

Testing

  • 54 new test cases across 5 files (models, service, vector service, routes, batch job); all mock Qdrant/Ollama.
  • Full suite: 122 passed. ruff check + ruff format --check clean.

Docs

  • Updated README, docs/api.md, docs/architecture.md, docs/README.md, config.example.yaml.
  • CHANGELOG under 0.8.0; version bumped 0.7.0 -> 0.8.0.

… group. We already have in AgentForge proper content extraction tools implemented and reuse them instead of trying to extract large PDF and other documents via frontend JS package
… a vector search, to retrieve the most relevant passages from an entry (prompt query), to re-chunk kb attachments to parent and its own entry

- Improve page marker chunking for kb entries
- Improve overall kb chunking
- Bump up the app version (new release)
- Update the CHANGELOG and documentation
- Add the new knowledge base section to the `config.example.yaml`
@bulletinmybeard bulletinmybeard merged commit a9fc662 into master Jun 21, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant