Knowledge base feature#7
Merged
Merged
Conversation
… group. We already have in AgentForge proper content extraction tools implemented and reuse them instead of trying to extract large PDF and other documents via frontend JS package
… a vector search, to retrieve the most relevant passages from an entry (prompt query), to re-chunk kb attachments to parent and its own entry - Improve page marker chunking for kb entries - Improve overall kb chunking
- Bump up the app version (new release) - Update the CHANGELOG and documentation - Add the new knowledge base section to the `config.example.yaml`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a personal Knowledge Database to
agentforge-api(/knowledge/*): a storefor user-created entries — snippets, commands, URLs, configs, error solutions,
notes, API examples — in its own Qdrant collection (
knowledge_entries),separate from the RAG index. Same embedding pipeline, dedicated CRUD + search.
What's included
API (
/knowledge/*)/searchand/search/smartwith tag/type/project filters, plus tag faceting and stats/filter: list entries by metadata filters (incl.parent_id) without a vector search/entries/{id}/context: most relevant passages from one entry for a query, with adjacent pages for context/entries/{id}/rechunk: rebuild page chunks for entries indexed before chunking existed/extract: server-side text extraction from uploads (PDF via pdfplumber,pdftotextfallback; text/code/config as UTF-8), reusing AgentForge's extraction path instead of frontend JSBehavior
title,content, ornoteschange; metadata-only edits skip the embedding callparent_id, with per-page chunking so a parent and its attached documents are searchable as passagesmetadatafree-form field on all pointsConfig
knowledgeblock:collection_name,dedup_threshold,composite_template(env prefixKNOWLEDGE_)Implementation notes
knowledge_service.pyorchestrates embed -> dedup -> upsert;knowledge_vector_service.pyowns the Qdrant collection (lazy client, payload indexes, page-chunk search).embedding_servicea lazy proxy so importing the service stack no longer builds the embedding client at import time — keeps test collection working without aconfig.yaml(gitignored, absent in CI). Construction defers to the first real.embed()call.Testing
ruff check+ruff format --checkclean.Docs
docs/api.md,docs/architecture.md,docs/README.md,config.example.yaml.0.8.0; version bumped0.7.0->0.8.0.