feat(retrieval): add manual BM25 query expansion and diagnostics by fryeggs · Pull Request #292 · CortexReach/memory-lancedb-pro

fryeggs · 2026-03-20T15:54:09Z

Summary

This is PR 1/4 in a series that extends memory-lancedb-pro in layered steps.

This PR is intentionally the smallest and lowest-risk step. It improves manual retrieval ergonomics without changing auto-recall behavior.

It adds:

BM25 query expansion for manual / CLI retrieval only
Retrieval diagnostics for manual / CLI search
CLI debug output for retrieval diagnostics

Why

Users often search with colloquial phrases such as 挂了, 卡住, or 报错, while stored memories often contain more technical wording like crash, timeout, error, or exception.

The vector leg already helps semantically, but the BM25 leg still matters for exact-term boosting and mixed-language memory bases. This PR improves that explicit/manual lookup path while deliberately leaving auto-recall unchanged.

Scope and safety

no change to auto-recall query behavior
no change to vector query text
query expansion is limited to manual / CLI retrieval
focused tests cover colloquial expansion, false-positive protection, gating, and debug output

Companion PRs in this series

PR 2/4: scored capture pipeline and ingestion safeguards
PR 3/4: standalone runtime and MCP surface
PR 4/4: Claude / Codex host integrations

Validation

node --test test/query-expander.test.mjs

Passed locally.

rwmjhb · 2026-03-23T02:58:15Z

This error path is reading diagnostics from the wrong retriever instance.

When context.embedder is present, runSearch() executes against a fresh retriever created
by getSearchRetriever(), not context.retriever. But the outer catch block still reads
diagnostics from context.retriever.getLastDiagnostics?.().

In the normal runtime path, CLI registration does pass embedder, so on memory-pro search --debug / --json --debug failures we can lose the diagnostics payload entirely and only
return the error.

I verified this with a minimal repro against the PR branch. The current tests don’t catch it
because they stub context.retriever directly and don’t exercise the embedder -> createRetriever() path.

Can we thread the last-used search retriever (or its diagnostics) through the failure path so
debug output comes from the instance that actually executed the search?

feat: improve manual retrieval diagnostics and query expansion

23874e9

fryeggs changed the title ~~feat: improve manual retrieval diagnostics and BM25 query expansion~~ feat(retrieval): add manual BM25 query expansion and diagnostics Mar 20, 2026

andychu666 mentioned this pull request Mar 21, 2026

feat: configurable mapping table for cross-script BM25 query expansion (generalizes #292) #297

Closed

This was referenced Mar 22, 2026

feat(integrations): add Claude and Codex host bridges #295

Closed

feat(runtime): add standalone MCP server and shared host runtime #294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retrieval): add manual BM25 query expansion and diagnostics#292

feat(retrieval): add manual BM25 query expansion and diagnostics#292
fryeggs wants to merge 1 commit intoCortexReach:masterfrom
fryeggs:codex/query-expander

fryeggs commented Mar 20, 2026 •

edited

Loading

Uh oh!

rwmjhb commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fryeggs commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Scope and safety

Companion PRs in this series

Validation

Uh oh!

rwmjhb commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fryeggs commented Mar 20, 2026 •

edited

Loading