Skip to content

feat(retrieval): add manual BM25 query expansion and diagnostics#292

Open
fryeggs wants to merge 1 commit intoCortexReach:masterfrom
fryeggs:codex/query-expander
Open

feat(retrieval): add manual BM25 query expansion and diagnostics#292
fryeggs wants to merge 1 commit intoCortexReach:masterfrom
fryeggs:codex/query-expander

Conversation

@fryeggs
Copy link

@fryeggs fryeggs commented Mar 20, 2026

Summary

This is PR 1/4 in a series that extends memory-lancedb-pro in layered steps.

This PR is intentionally the smallest and lowest-risk step. It improves manual retrieval ergonomics without changing auto-recall behavior.

It adds:

  1. BM25 query expansion for manual / CLI retrieval only
  2. Retrieval diagnostics for manual / CLI search
  3. CLI debug output for retrieval diagnostics

Why

Users often search with colloquial phrases such as 挂了, 卡住, or 报错, while stored memories often contain more technical wording like crash, timeout, error, or exception.

The vector leg already helps semantically, but the BM25 leg still matters for exact-term boosting and mixed-language memory bases. This PR improves that explicit/manual lookup path while deliberately leaving auto-recall unchanged.

Scope and safety

  • no change to auto-recall query behavior
  • no change to vector query text
  • query expansion is limited to manual / CLI retrieval
  • focused tests cover colloquial expansion, false-positive protection, gating, and debug output

Companion PRs in this series

  • PR 2/4: scored capture pipeline and ingestion safeguards
  • PR 3/4: standalone runtime and MCP surface
  • PR 4/4: Claude / Codex host integrations

Validation

node --test test/query-expander.test.mjs

Passed locally.

@fryeggs fryeggs changed the title feat: improve manual retrieval diagnostics and BM25 query expansion feat(retrieval): add manual BM25 query expansion and diagnostics Mar 20, 2026
@rwmjhb
Copy link
Collaborator

rwmjhb commented Mar 23, 2026

This error path is reading diagnostics from the wrong retriever instance.

When context.embedder is present, runSearch() executes against a fresh retriever created
by getSearchRetriever(), not context.retriever. But the outer catch block still reads
diagnostics from context.retriever.getLastDiagnostics?.().

In the normal runtime path, CLI registration does pass embedder, so on memory-pro search --debug / --json --debug failures we can lose the diagnostics payload entirely and only
return the error.

I verified this with a minimal repro against the PR branch. The current tests don’t catch it
because they stub context.retriever directly and don’t exercise the embedder -> createRetriever() path.

Can we thread the last-used search retriever (or its diagnostics) through the failure path so
debug output comes from the instance that actually executed the search?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants