feat(retrieval): Phase 4 — query (vector + filters + hybrid)#242
feat(retrieval): Phase 4 — query (vector + filters + hybrid)#242jamby77 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit bf41ae5. Configure here.
bf41ae5 to
4bfc9aa
Compare
- Add buildFtSearchQuery: KNN query string with TAG/NUMERIC filter clauses
- Add Retriever.query: embed text or accept a precomputed vector, FT.SEARCH,
map rows to { id, score, fields, text } stripping reserved fields
- Reject both/neither text|vector; empty results return []
- Support optional hybrid:'rerank' via an injectable rerankFn
- Export query types, buildFtSearchQuery, SCORE_FIELD
4bfc9aa to
b878394
Compare
There was a problem hiding this comment.
Mostly looks great — just one thing I'd want sorted before this goes in.
QueryHit.score is the raw KNN __score, which is cosine distance: smaller means closer. But a field called score reads like "bigger is better," so anyone consuming this is going to sort it the wrong way and quietly get their results inverted. That's the kind of bug that's really annoying to track down later.
You already call this out in the PR description as worth documenting, so I'd just pull that into the code itself — a short JSDoc on QueryHit.score (and maybe a quick note where it's set in mapHit) saying it's a distance where lower is closer. One line now saves someone a confusing afternoon down the road.
Other than that I'm happy with it — this is the only thing I'd block on.
…te vector dims - Add JSDoc on QueryHit.score clarifying it is a KNN distance (lower is closer), so consumers rank ascending instead of assuming higher is better - Resolve inferred dimensions for precomputed-vector queries so a mismatched vector is rejected before reaching FT.SEARCH, matching the text path

Phase 4 —
query(vector + filters + hybrid)Stacked on #241 (Phase 3) — base is the Phase 3 branch, not master.
What's new
buildFtSearchQuery(schema, k, filter?)— pure builder:(filter)=>[KNN k @vec $vec AS __score]; TAG →@f:{escapeTag(v)}, NUMERIC →@f:[v v]. Throws on unknown fields, TEXT fields, and non-numeric values for numeric fields.Retriever.query(options)— embedstextor uses a precomputedvector(exactly one), runsFT.SEARCH … PARAMS 2 vec … LIMIT 0 k DIALECT 2, maps rows →{ id (prefix stripped), score, text, fields }; empty →[]. Validateskis a positive integer and that a precomputed vector matches the index dimension.hybrid: 'rerank'via an injectablererankFn(queryText, hits)(requiresrerankFn+text).fields.tscentralizes the reserved field names (__text,__score);buildFtCreateArgsnow rejects schema fields using them.Public API
Exports
buildFtSearchQuery,QueryFilter,QueryHit,QueryOptions,RerankFn, and theTEXT_FIELD/SCORE_FIELDconstants.Tests
20 unit tests (6 builder + 14 query) plus reserved-field guards; full package suite 76/76 green,
tsc --noEmit+ prettier clean.Review-driven changes (pre-PR review)
Added:
kpositive-integer validation, NUMERIC filter value-type check, precomputed-vector dimension check, reserved-field guard at index creation (via sharedfields.ts), and a stronger rerank assertion verifying it receives mapped hits.Deferred (not in this phase)
RETURNclause to avoid shipping embedding bytes for every hit — Phase 5 (perf/observability).>kcandidates then trim) — the plan scopesrerankto reordering k; over-fetch is a future enhancement.scoresemantics: KNN returns distance (smaller = closer); the field is namedscoreper the spec — worth documenting for callers.Note
Medium Risk
New search API on the retrieval path with filter/query validation; well-covered by unit tests and no auth/data-migration changes, but incorrect filter or dimension handling could affect production search behavior.
Overview
Adds Phase 4 vector retrieval: KNN search over Valkey/RediSearch with optional metadata filters and optional post-search reranking.
buildFtSearchQuerybuilds dialect-2 KNN strings (filter=>[KNN k @embedding $vec AS __score]), with TAG and NUMERIC filters (escaped tags, exact numeric ranges). It rejects unknown fields, TEXT filters, and non-numeric numeric values.Retriever.queryaccepts eithertext(viaembedFn) or a precomputedvector, runsFT.SEARCHwithPARAMS/LIMIT/DIALECT 2, and maps hits to{ id, score, text, fields }(stripping key prefix and internal__text/__score/embedding).hybrid: 'rerank'calls an injectablererankFn(requires querytext). Validates positive integerk, mutual exclusivity of text vs vector, and query vector dimensions (including inferred dims when schema omitsdims).Reserved names move to
fields.ts(__text,__score); index creation and upsert reject them in schema/entry fields. Public exports includebuildFtSearchQuery,QueryFilter,QueryHit,QueryOptions,RerankFn, and score semantics documented as distance (lower = closer).Reviewed by Cursor Bugbot for commit 33287ab. Bugbot is set up for automated code reviews on this repo. Configure here.