feat(retrieval): Phase 4 — query (vector + filters + hybrid) by jamby77 · Pull Request #242 · BetterDB-inc/monitor

jamby77 · 2026-06-15T17:52:17Z

Phase 4 — `query` (vector + filters + hybrid)

Stacked on #241 (Phase 3) — base is the Phase 3 branch, not master.

What's new

buildFtSearchQuery(schema, k, filter?) — pure builder: (filter)=>[KNN k @vec $vec AS __score]; TAG → @f:{escapeTag(v)}, NUMERIC → @f:[v v]. Throws on unknown fields, TEXT fields, and non-numeric values for numeric fields.
Retriever.query(options) — embeds text or uses a precomputed vector (exactly one), runs FT.SEARCH … PARAMS 2 vec … LIMIT 0 k DIALECT 2, maps rows → { id (prefix stripped), score, text, fields }; empty → []. Validates k is a positive integer and that a precomputed vector matches the index dimension.
hybrid: 'rerank' via an injectable rerankFn(queryText, hits) (requires rerankFn + text).
New fields.ts centralizes the reserved field names (__text, __score); buildFtCreateArgs now rejects schema fields using them.

Public API

Exports buildFtSearchQuery, QueryFilter, QueryHit, QueryOptions, RerankFn, and the TEXT_FIELD/SCORE_FIELD constants.

Tests

20 unit tests (6 builder + 14 query) plus reserved-field guards; full package suite 76/76 green, tsc --noEmit + prettier clean.

Review-driven changes (pre-PR review)

Added: k positive-integer validation, NUMERIC filter value-type check, precomputed-vector dimension check, reserved-field guard at index creation (via shared fields.ts), and a stronger rerank assertion verifying it receives mapped hits.

Deferred (not in this phase)

RETURN clause to avoid shipping embedding bytes for every hit — Phase 5 (perf/observability).
Rerank over-fetch (fetch >k candidates then trim) — the plan scopes rerank to reordering k; over-fetch is a future enhancement.
score semantics: KNN returns distance (smaller = closer); the field is named score per the spec — worth documenting for callers.
Real-Valkey round-trip lands in Phase 6 integration.

Note

Medium Risk
New search API on the retrieval path with filter/query validation; well-covered by unit tests and no auth/data-migration changes, but incorrect filter or dimension handling could affect production search behavior.

Overview
Adds Phase 4 vector retrieval: KNN search over Valkey/RediSearch with optional metadata filters and optional post-search reranking.

buildFtSearchQuery builds dialect-2 KNN strings (filter=>[KNN k @embedding $vec AS __score]), with TAG and NUMERIC filters (escaped tags, exact numeric ranges). It rejects unknown fields, TEXT filters, and non-numeric numeric values.

Retriever.query accepts either text (via embedFn) or a precomputed vector, runs FT.SEARCH with PARAMS/LIMIT/DIALECT 2, and maps hits to { id, score, text, fields } (stripping key prefix and internal __text/__score/embedding). hybrid: 'rerank' calls an injectable rerankFn (requires query text). Validates positive integer k, mutual exclusivity of text vs vector, and query vector dimensions (including inferred dims when schema omits dims).

Reserved names move to fields.ts (__text, __score); index creation and upsert reject them in schema/entry fields. Public exports include buildFtSearchQuery, QueryFilter, QueryHit, QueryOptions, RerankFn, and score semantics documented as distance (lower = closer).

^{Reviewed by Cursor Bugbot for commit 33287ab. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit bf41ae5. Configure here.}

- Add buildFtSearchQuery: KNN query string with TAG/NUMERIC filter clauses - Add Retriever.query: embed text or accept a precomputed vector, FT.SEARCH, map rows to { id, score, fields, text } stripping reserved fields - Reject both/neither text|vector; empty results return [] - Support optional hybrid:'rerank' via an injectable rerankFn - Export query types, buildFtSearchQuery, SCORE_FIELD

KIvanow

Mostly looks great — just one thing I'd want sorted before this goes in.

QueryHit.score is the raw KNN __score, which is cosine distance: smaller means closer. But a field called score reads like "bigger is better," so anyone consuming this is going to sort it the wrong way and quietly get their results inverted. That's the kind of bug that's really annoying to track down later.

You already call this out in the PR description as worth documenting, so I'd just pull that into the code itself — a short JSDoc on QueryHit.score (and maybe a quick note where it's set in mapHit) saying it's a distance where lower is closer. One line now saves someone a confusing afternoon down the road.

Other than that I'm happy with it — this is the only thing I'd block on.

…te vector dims - Add JSDoc on QueryHit.score clarifying it is a KNN distance (lower is closer), so consumers rank ascending instead of assuming higher is better - Resolve inferred dimensions for precomputed-vector queries so a mismatched vector is rejected before reaching FT.SEARCH, matching the text path

cursor Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread packages/retrieval/src/retriever.ts

Comment thread packages/retrieval/src/retriever.ts

jamby77 mentioned this pull request Jun 15, 2026

feat(retrieval): Phase 5 — discovery, observability & health #243

Open

Base automatically changed from feature/retrieval-sdk-phase3-upsert-delete to master June 16, 2026 08:28

jamby77 force-pushed the feature/retrieval-sdk-phase4-query branch from bf41ae5 to 4bfc9aa Compare June 16, 2026 08:29

jamby77 force-pushed the feature/retrieval-sdk-phase4-query branch from 4bfc9aa to b878394 Compare June 16, 2026 08:59

KIvanow requested changes Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retrieval): Phase 4 — query (vector + filters + hybrid)#242

feat(retrieval): Phase 4 — query (vector + filters + hybrid)#242
jamby77 wants to merge 2 commits into
masterfrom
feature/retrieval-sdk-phase4-query

jamby77 commented Jun 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

KIvanow left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jamby77 commented Jun 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Phase 4 — query (vector + filters + hybrid)

What's new

Public API

Tests

Review-driven changes (pre-PR review)

Deferred (not in this phase)

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

KIvanow left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jamby77 commented Jun 15, 2026 •

edited by cursor Bot

Loading

Phase 4 — `query` (vector + filters + hybrid)

KIvanow left a comment •

edited

Loading