Skip to content

feat: dual-tokenizer search (natural language + code-preserving) #55

@salishforge

Description

@salishforge

Summary

Add a code-preserving search mode alongside natural language FTS for agents that work with code.

Design

  • Add a `content_code_tsv` TSVECTOR column using 'simple' config (no stemming, preserves symbols)
  • Query mode 'code' uses the simple tokenizer
  • Existing 'keyword' mode continues using English stemming
  • Auto-detect: if query contains code patterns (camelCase, dots, underscores), prefer code tokenizer

Inspiration

Inspired by CCRider (MIT) dual FTS5 tokenizer approach — natural language (Porter stemming) + code-specific (no stemming, preserving symbols).

Closes #55

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformancePerformance improvements

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions