Skip to content

feat(smart-search): boost title/narrative matches on 'who/what is X' queries#571

Open
efenex wants to merge 1 commit into
rohitg00:mainfrom
efenex:feat/v4-b-smart-search-named-concept-boost
Open

feat(smart-search): boost title/narrative matches on 'who/what is X' queries#571
efenex wants to merge 1 commit into
rohitg00:mainfrom
efenex:feat/v4-b-smart-search-named-concept-boost

Conversation

@efenex
Copy link
Copy Markdown
Contributor

@efenex efenex commented May 20, 2026

Summary

For named-concept queries ("who is the careful generator?", "what is a circuit breaker", "what does eventual consistency mean?"), the BM25 hybrid ranker scores busier observations above records that name the concept directly — question scaffolding tokens ("who", "is", "the") add noise that dilutes the true match signal. The record that defines the concept ranks below records that mention it incidentally.

What it does

  1. Detect the query as a named-concept pattern via 5 regexes (`/who is/`, `/what is/`, `/what's/`, `/what does X mean/`, `/who's/`). Skip if no match.
  2. Extract the concept phrase (e.g. "careful generator"). Reject degenerate phrases — single tokens shorter than 3 chars (`it`, `x`) and phrases longer than 6 tokens.
  3. Deepen the BM25 sweep to `limit*3` so the boost has candidates to re-rank (boost on a top-10 set has limited room to move records around).
  4. Re-rank with multiplicative boosts:
    • Title contains the phrase → 2.0×
    • Narrative contains the phrase → 1.3×
  5. Same treatment for lessons whose content contains the phrase (2.0×).
  6. Re-sort by combined score, trim to original `limit`.

Non-named-concept queries are untouched.

Why this lives in smart-search and not lineage

`mem::lineage` is chronologically-ordered and multi-channel; this is a ranking concern that affects the primary recall path (smart-search), which is what `memory_recall` / `memory_smart_search` MCP tools land on. Lineage benefits from upstream improvements in BM25 score, so this lift propagates.

Test plan

  • `npm test` passes
  • New unit tests for `extractNamedConcept` (7 cases) — pattern matching, degenerate-phrase rejection
  • New integration test that proves the boost re-ranks: an observation whose title contains "careful generator" but has lower BM25 score than a busier unrelated observation gets moved to rank fix: system audit -- 10 bugs fixed across hooks, triggers, and core #1
  • Non-named-concept query preserves original ordering (regression test)

Related

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Concept-aware search: queries like "who is/what is/what does…" are detected and re-rank results so matching observations and lessons surface higher for concept-focused queries.
    • Lessons receive additional boosting when their content matches the detected concept, improving relevance for conceptual queries.
  • Tests
    • Added tests covering concept extraction and verifying that conceptual queries trigger expected re-ranking while non-concept queries keep original order.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a835dd0f-832c-4bab-983f-d009646ac476

📥 Commits

Reviewing files that changed from the base of the PR and between 997d25d and 3a1f8e7.

📒 Files selected for processing (3)
  • src/functions/smart-search.ts
  • src/types.ts
  • test/smart-search.test.ts
💤 Files with no reviewable changes (3)
  • src/types.ts
  • test/smart-search.test.ts
  • src/functions/smart-search.ts

📝 Walkthrough

Walkthrough

Adds extraction of named concepts from "who is / what is / what does X mean" queries and uses the concept to expand observation fetches and apply multiplicative boosts when the concept appears in observation titles/narratives or lesson content, then re-sorts and trims results.

Changes

Named-Concept Query Detection and Ranking Boost

Layer / File(s) Summary
Named-concept extraction and boost constants
src/functions/smart-search.ts, test/smart-search.test.ts
extractNamedConcept() parses "who is/what is/what does ... mean / what's ..." queries with regex, trims punctuation, filters degenerate token-length matches, and defines title/body boost multipliers. Unit tests verify extraction success and null cases.
Smart-search pipeline boost and re-ranking
src/functions/smart-search.ts, src/types.ts, test/smart-search.test.ts
mem::smart-search derives namedConcept, increases observation fetch size when present, runs hybrid observation search and recallLessons() (passing boostPhrase) in parallel, sets CompactLessonResult.boostMatched, applies multiplicative boosts to observation combinedScore (title/narrative) and lesson score (boostMatched or content match fallback), re-sorts and truncates results back to limit. Integration tests assert boosted re-ranking and stable ordering for non-matching queries.

Sequence Diagram

sequenceDiagram
  participant Query
  participant extractNamedConcept
  participant hybridSearch
  participant lessonRecall
  participant boostProcessor
  participant returnSorted

  Query->>extractNamedConcept: parse query -> concept|null
  extractNamedConcept-->>Query: concept|null
  Query->>hybridSearch: run observation search (expanded limit if concept)
  Query->>lessonRecall: run lesson recall (pass boostPhrase)
  hybridSearch-->>boostProcessor: observations with combinedScore
  lessonRecall-->>boostProcessor: lessons with boostMatched flag
  boostProcessor->>boostProcessor: multiply observation combinedScore for title/narrative matches
  boostProcessor->>boostProcessor: multiply lesson score when boostMatched or content includes concept
  boostProcessor->>returnSorted: re-sort and truncate to limit
  returnSorted-->>Query: final observations and lessons
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • rohitg00/agentmemory#473: Adds compact lesson inclusion and recallLessons/CompactLessonResult plumbing that this PR extends with boostMatched and named-concept ranking.

Poem

🐰 I sniff a phrase beneath the moonlit log,

"Who is" I twitch — a hopping catalog.
Titles sparkle where the concept lands,
Lessons hum like clapping hands.
Hooray — a carrot-coded search that stands!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a named-concept detection and boosting mechanism for 'who/what is X' style queries in smart-search.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/smart-search.test.ts (1)

331-335: ⚡ Quick win

Tighten this assertion so dual-match regressions actually fail.

obsNamed already contains "careful generator" in both title and narrative, but the test only asserts score > 1.0. That still passes with a single applied boost, so it won't catch the bug in the new re-ranker. Either remove the narrative match from the fixture for a pure title-only case, or assert the full expected multiplier for a dual-match case.

Also applies to: 387-389

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/smart-search.test.ts` around lines 331 - 335, The test fixture obsNamed
created via makeObs currently contains "careful generator" in both title and
narrative, which makes the weak assertion (score > 1.0) insufficient; either
remove the phrase from the narrative so the fixture is a title-only match and
keep the simple assertion, or tighten the assertion to check the full expected
boosted score for a dual-match (compute and assert the exact expected
multiplier/threshold instead of >1.0). Update the corresponding duplicate
assertions mentioned (around the second occurrence at lines 387-389) to use the
same fix and reference obsNamed/makeObs when locating the fixture and
assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/smart-search.ts`:
- Around line 151-156: The current boost logic uses the truncated preview in
rawLessons, so named-concept matching misses occurrences beyond the 240-char
cutoff; update the scoring to operate on the full lesson text before any preview
truncation by either (A) running this phrase includes check against the
untruncated field returned by recallLessons (e.g., use the original full content
property such as fullContent or contentFull instead of the previewed content) or
(B) change recallLessons to preserve a fullContent field on each lesson and use
that field in the map that adjusts score (referencing rawLessons, lessons,
phrase, and NAMED_CONCEPT_TITLE_BOOST). Ensure the boost is applied using the
full text and only truncate for presentation after ranking is complete.
- Around line 143-145: The current logic in smart-search that sets mult using an
if/else if (checking title.includes(phrase) then else if
narrative.includes(phrase)) prevents applying both NAMED_CONCEPT_TITLE_BOOST and
NAMED_CONCEPT_BODY_BOOST when both title and narrative match; change it to
compute the multiplier by starting mult = 1 and multiplying by
NAMED_CONCEPT_TITLE_BOOST if title.includes(phrase) and by
NAMED_CONCEPT_BODY_BOOST if narrative.includes(phrase), then return r unchanged
when mult === 1 else return { ...r, combinedScore: r.combinedScore * mult } so
dual matches get the product of both boosts (use the existing symbols title,
narrative, phrase, mult, NAMED_CONCEPT_TITLE_BOOST, NAMED_CONCEPT_BODY_BOOST, r,
combinedScore).

---

Nitpick comments:
In `@test/smart-search.test.ts`:
- Around line 331-335: The test fixture obsNamed created via makeObs currently
contains "careful generator" in both title and narrative, which makes the weak
assertion (score > 1.0) insufficient; either remove the phrase from the
narrative so the fixture is a title-only match and keep the simple assertion, or
tighten the assertion to check the full expected boosted score for a dual-match
(compute and assert the exact expected multiplier/threshold instead of >1.0).
Update the corresponding duplicate assertions mentioned (around the second
occurrence at lines 387-389) to use the same fix and reference obsNamed/makeObs
when locating the fixture and assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c24364d3-8993-4417-a12c-9c0c02cd7c30

📥 Commits

Reviewing files that changed from the base of the PR and between 93d1bdd and d1fcb71.

📒 Files selected for processing (2)
  • src/functions/smart-search.ts
  • test/smart-search.test.ts

Comment thread src/functions/smart-search.ts Outdated
Comment thread src/functions/smart-search.ts
efenex added a commit to efenex/agentmemory that referenced this pull request May 20, 2026
…l content

CodeRabbit caught two issues on rohitg00#571:

1. The boost branch used `if (title) ... else if (narrative) ...`,
   capping observations that contain the concept in BOTH fields at the
   title-only 2.0× multiplier. The feature is specified as
   multiplicative — title-and-narrative matches now compound to
   2.0 × 1.3 = 2.6×. Single-field matches behave as before.

2. The lesson boost path was scanning the 240-char preview emitted by
   recallLessons, not the lesson's full pre-truncation content. Any
   concept that appeared past the preview boundary silently missed
   the boost.

   Fix: thread the concept phrase into recallLessons via a new
   `boostPhrase` parameter. The function now decides match against
   `content + context` BEFORE truncation, stamps each result with
   `boostMatched: boolean`, and the smart-search caller uses that
   flag instead of re-scanning the preview.

   `boostMatched` added as an optional field on CompactLessonResult.
   Callers that don't pass `boostPhrase` get `boostMatched: false` —
   the smart-search caller falls back to scanning the (truncated)
   content for the phrase if `boostMatched` is absent, preserving the
   pre-fix behavior for any non-smart-search caller of recallLessons.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Audited locally: build clean, 5 new tests pass (1086 total). Self-contained regex + boost multipliers, well-documented, fallback path leaves untouched queries unchanged. Multiplicative (title 2.0 × body 1.3 = 2.6×) when both match — CodeRabbit had already caught the prior else-if cap. Ready to merge.

…queries

For identity/definition queries ("who is the careful generator?", "what is
mem::lineage?"), extract the named concept and boost hybrid + lesson hits whose
title/narrative names it directly, so the defining record outranks incidental
mentions. Over-fetches (3x, capped) before boosting so the rerank has headroom;
composes with the upstream agentId filter (filter first, then boost, then trim).
Rebased on current main.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@efenex efenex force-pushed the feat/v4-b-smart-search-named-concept-boost branch from 997d25d to 3a1f8e7 Compare June 5, 2026 00:53
@efenex
Copy link
Copy Markdown
Contributor Author

efenex commented Jun 5, 2026

Rebased on current main. Composes with the upstream agentId filter — filters by agent first, then applies the named-concept boost, then trims to limit. Tests green; the only red check is the Vercel fork-preview.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants