Skip to content

Add dynamic metadata filtering for RAG queries #21

@samkeen

Description

@samkeen

Feature Request: Dynamic Metadata Filtering for RAG Queries

Problem Statement

Currently, the RAG implementation only supports filtering by:

  • Number of results (n_results)
  • Distance threshold (distance_threshold)

However, ChromaDB supports powerful metadata filtering through its where parameter, which could significantly improve the precision of document retrieval.

Proposed Solution

Add a dynamic filter builder UI that allows users to create metadata-based filters for their RAG queries.

Implementation Overview

Backend Changes

  1. RAG Config Service (rag_config_service.py)

    • Add method to detect available metadata fields from collection
    • Modify query_collection() to accept where parameter
    • Store filter preferences in config
  2. API Endpoints (routes.py)

    • GET /api/rag/metadata-fields - Return available fields with types and unique values
    • Update POST /api/chat to accept filter parameters

Frontend Changes

  1. Main Chat Interface (script.js, index.html)

    • Collapsible filter panel below RAG toggle
    • Dynamic filter rows with field/operator/value selectors
    • Support AND/OR logic between filters
    • Show active filter count badge
  2. Settings Page (settings.js, settings.html)

    • Preview available metadata fields when collection selected
    • Configure default filters

Filter Types Support

  • Text fields: equals, contains (using $in)
  • Numbers: equals, $gt, $lt, $gte, $lte, range
  • Lists: multi-select with $in/$nin
  • Dates: date picker with comparison operators

Example Filter Format

{
  "filters": [
    {"field": "author", "operator": "$eq", "value": "John Doe"},
    {"field": "chapter", "operator": "$in", "value": [1, 2, 3]},
    {"field": "date", "operator": "$gte", "value": "2024-01-01"}
  ],
  "logic": "$and"  // or "$or"
}

User Benefits

  1. Precision: Target specific document subsets (e.g., "only search in chapter 3")
  2. Efficiency: Reduce noise from irrelevant content
  3. Flexibility: Build complex queries without writing code
  4. Discovery: Explore metadata patterns in the corpus
  5. Performance: Smaller, more relevant result sets

Additional Features to Consider

  • Save/load filter presets
  • Quick filter templates ("Recent docs", "By author")
  • Filter match explanations in results
  • Visual indicators for active filters
  • Recently used filters history

ChromaDB Reference

ChromaDB supports these metadata filter operators:

  • Comparison: $eq, $ne, $gt, $gte, $lt, $lte
  • Logical: $and, $or
  • Inclusion: $in, $nin

Documentation: https://docs.trychroma.com/docs/querying-collections/metadata-filtering

Acceptance Criteria

  • Users can add/remove filter conditions dynamically
  • Filters persist across page refreshes
  • Filter UI shows available fields from current collection
  • Applied filters are visible in chat details modal
  • Clear documentation on how to use filters

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions