Feature Request: Dynamic Metadata Filtering for RAG Queries
Problem Statement
Currently, the RAG implementation only supports filtering by:
- Number of results (
n_results)
- Distance threshold (
distance_threshold)
However, ChromaDB supports powerful metadata filtering through its where parameter, which could significantly improve the precision of document retrieval.
Proposed Solution
Add a dynamic filter builder UI that allows users to create metadata-based filters for their RAG queries.
Implementation Overview
Backend Changes
-
RAG Config Service (rag_config_service.py)
- Add method to detect available metadata fields from collection
- Modify
query_collection() to accept where parameter
- Store filter preferences in config
-
API Endpoints (routes.py)
GET /api/rag/metadata-fields - Return available fields with types and unique values
- Update
POST /api/chat to accept filter parameters
Frontend Changes
-
Main Chat Interface (script.js, index.html)
- Collapsible filter panel below RAG toggle
- Dynamic filter rows with field/operator/value selectors
- Support AND/OR logic between filters
- Show active filter count badge
-
Settings Page (settings.js, settings.html)
- Preview available metadata fields when collection selected
- Configure default filters
Filter Types Support
- Text fields: equals, contains (using
$in)
- Numbers: equals,
$gt, $lt, $gte, $lte, range
- Lists: multi-select with
$in/$nin
- Dates: date picker with comparison operators
Example Filter Format
{
"filters": [
{"field": "author", "operator": "$eq", "value": "John Doe"},
{"field": "chapter", "operator": "$in", "value": [1, 2, 3]},
{"field": "date", "operator": "$gte", "value": "2024-01-01"}
],
"logic": "$and" // or "$or"
}
User Benefits
- Precision: Target specific document subsets (e.g., "only search in chapter 3")
- Efficiency: Reduce noise from irrelevant content
- Flexibility: Build complex queries without writing code
- Discovery: Explore metadata patterns in the corpus
- Performance: Smaller, more relevant result sets
Additional Features to Consider
- Save/load filter presets
- Quick filter templates ("Recent docs", "By author")
- Filter match explanations in results
- Visual indicators for active filters
- Recently used filters history
ChromaDB Reference
ChromaDB supports these metadata filter operators:
- Comparison:
$eq, $ne, $gt, $gte, $lt, $lte
- Logical:
$and, $or
- Inclusion:
$in, $nin
Documentation: https://docs.trychroma.com/docs/querying-collections/metadata-filtering
Acceptance Criteria
Feature Request: Dynamic Metadata Filtering for RAG Queries
Problem Statement
Currently, the RAG implementation only supports filtering by:
n_results)distance_threshold)However, ChromaDB supports powerful metadata filtering through its
whereparameter, which could significantly improve the precision of document retrieval.Proposed Solution
Add a dynamic filter builder UI that allows users to create metadata-based filters for their RAG queries.
Implementation Overview
Backend Changes
RAG Config Service (
rag_config_service.py)query_collection()to acceptwhereparameterAPI Endpoints (
routes.py)GET /api/rag/metadata-fields- Return available fields with types and unique valuesPOST /api/chatto accept filter parametersFrontend Changes
Main Chat Interface (
script.js,index.html)Settings Page (
settings.js,settings.html)Filter Types Support
$in)$gt,$lt,$gte,$lte, range$in/$ninExample Filter Format
User Benefits
Additional Features to Consider
ChromaDB Reference
ChromaDB supports these metadata filter operators:
$eq,$ne,$gt,$gte,$lt,$lte$and,$or$in,$ninDocumentation: https://docs.trychroma.com/docs/querying-collections/metadata-filtering
Acceptance Criteria