Releases: rmk40/opencode-session-recall
Release list
v0.12.1
A bug-fix release. Search worked, but retrieving and browsing the results was
broken — so after recall found a hit, reading or paginating it could come
back empty. Searching for things you can't then retrieve is no use; this fixes
the retrieval half of the flow.
Fixed
- Tool-argument defaults were not applied, breaking the browse/retrieve
tools. opencode passes the model's raw argument object to a plugin tool's
execute; it validates against the Zod schema but does not feed back the
parsed value, so schema.default()s are never materialized. When the model
omittedrole,recall_messagessawrole: undefined, its role filter
matched nothing, and a fully-loaded session reportedtotal: 0with no
messages (verified live: 590 messages fetched, 0 after the filter). The same
missing-default behavior leftrecall_contextwithNaNslice bounds and
could makerecall_getthrow on an undefinedmessageID.recall(search)
already coerced its own args defensively; the four browse/retrieve tools now
do too, via sharedcoerceEnum/coerceBool/coerceInthelpers plus
required-argument guards. Impact: retrieving or browsing a session —
including cross-project results thatrecallsurfaces — now works regardless
of which optional args the caller sends. recallcould surface its own prior output under renamed tool calls. When
tool names are namespaced upstream (e.g.mcp__…__recall), they slipped past
the self-exclusion guard, so recall could match earlier recall results; and
inlineexpandcould include a nearby recall call's output. Both are now
excluded from search and redacted from expansion, matching by suffix so
namespaced variants are caught. Explicitrecall_get/recall_context
remain full-fidelity.
Verified end-to-end by driving a fresh opencode session against real history.
Latest Snapshot
Rolling pre-release tracking the latest main commit.
Commit: 06b3bbf
Date: 2026-06-21T01:20:33Z
This is not a stable release. Use a versioned tag (e.g., v0.1.0) for production.
v0.12.0
This release rebuilds how recall ranks results and adds the ability for the
agent to reach for its history on its own. No existing tool parameter changed
its meaning, so upgrades are drop-in.
Highlights
- Relevance ranking is now BM25 instead of fuzzy string matching.
smart
andfuzzysearch are powered by an in-memory MiniSearch
BM25 index built per query. BM25 weights rare, discriminative terms over
common boilerplate and normalizes for document length, so a short message that
is actually about your query beats a long log that merely mentions the words. - Proactive recall. Three opt-level features help the agent search history
when it should, instead of waiting to be told: a default-on system-prompt
nudge, and two opt-in hooks (autoRecall,compactionRecall). regexmatch mode for exact shapes — error codes, stack traces, file
paths, IDs, URLs.- Result diversity so one noisy session can't flood a result list.
- Query-shape routing that suggests
regexwhen a query looks like a
pattern, without ever overriding the caller.
Expected effectiveness
The ranker change is measured, not asserted. A new relevance eval harness
(test/eval/) scores a labeled corpus of eight retrieval cases that exercise
the situations recall is for: rare-term recall, prior-decision recall, vague
"same as before" recall, typo tolerance, exact-phrase preference, cross-project
recall, and old-but-strong vs. recent-but-weak ranking.
| Ranker | MRR | recall@5 |
|---|---|---|
| Previous (Fuse.js) | 0.50 | 0.50 |
| BM25 (this release) | 1.00 | 1.00 |
The previous ranker returned nothing on four of the eight cases (exact
phrase, cross-project error, old-strong-vs-recent-weak, and long-document
competition). BM25 returns the correct session at rank 1 for all eight. The
eval is wired into npm run check as a regression gate, so future ranking
changes must meet or beat these numbers.
Practical effect: queries that name a specific symbol, error string, file, or
decision now rank the right hit at or near the top far more reliably, and broad
queries no longer get drowned out by long, boilerplate-heavy tool output.
Added
- BM25 ranking (
smart,fuzzy) via MiniSearch, replacing Fuse.js.
Structural boosts (exact phrase, full token coverage, reasoning traces, error
output, user messages, recency) and penalties (weak single-token fuzzy, poor
coverage) are layered on the BM25 base score as multipliers. Scores are
reported 0–1. match: "regex"— bounded regular-expression scan over message and tool
content. Invalid patterns return a clear error instead of silently matching
nothing.- Result diversity — in part-grouped results, a single session's share of
the initial result list is capped so it can't crowd out other sessions;
held-back hits backfill if room remains. - Query routing — when a literal query looks like a regular expression, the
response includes a non-overriding suggestion to usematch: "regex". - Proactive recall options:
nudge(default on): adds a short system-prompt reminder to search
history when you reference prior work. Text only — a few tokens per request,
no latency, no I/O.autoRecall(default off): when a message clearly references earlier
work ("last time", "what did we decide", "same as before", "previously"),
runs a bounded recall and injects the top one to three cited hits into the
agent's context before it answers. Hard-bounded to 1.5s and a capped session
scan so it can never stall a turn; stays quiet when it finds nothing.compactionRecall(default off): before a session is compacted, pulls
the strongest durable findings from that session and appends them to the
compaction prompt so the summary preserves them.
- Relevance eval harness (
test/eval/) with a labeled corpus, MRR and
recall@5 metrics, and a locked baseline that gatesnpm run check.
Changed
smart/fuzzyno longer have a "degraded mode" that silently switched
ranking algorithms under load. A time budget still applies, but it only flags
elevated latency (degradeKind: "time") — the ranking itself is unchanged.- Tokenization is split: a duplicate-preserving tokenizer feeds the BM25 index
(so term frequency is meaningful), while a deduplicated tokenizer backs
set-membership checks. - README reorganized so the value proposition and install come first and the
agent-facing reference is grouped at the end. CONTRIBUTING's architecture
section rewritten for the BM25 pipeline, the three execution paths, and the
invocation hooks.
Removed
- Fuse.js dependency and the legacy
fuse/prefilter/rankmodules.
MiniSearch's built-in fuzzy matching covers typo tolerance.
Compatibility
- All existing
recallparameters keep their meaning;matchgains a new
"regex"value. Thescorefield on results is now BM25-derived (still
0–1).nudgeis on by default;autoRecallandcompactionRecallare
opt-in. No configuration changes are required to upgrade.
v0.11.0
Forgiving search UX with coverage, suggestions, and live-MCP defenses. All additions are non-breaking; existing call sites continue to work.
Behavior
- Unified title + message/tool/reasoning content search in
recall. Results carrysource,why(matchedFields/matchedTerms/recency/confidence),titleMatch, anddirectoryRelevance. - Forgiving time API: new
last,from,tofields, plus ISO-string forms ofbefore/after. Conflicting bounds resolve to the most restrictive valid window (newest lower / oldest upper) and warn instead of erroring. Degenerate durations and relative durations on absolute-only fields are normalized with warnings. - Default scan covers all eligible sessions. The previous implicit 1000-session cap is gone;
sessionsis now an optional cap, bounded bymaxSessions, provider limits, and time/abort budgets. - Directory fallback (
fallback: true) buckets exact → project → global with bucket counts andlimitedByreasons. Never broadens beyondscope: "project". - Partial expansion:
expand: "context"now returns base hits plus as much expansion as fits, with warnings, instead of hard-failing on budget caps. New inputs:window: "auto",expandBudgetMessages,expandBudgetChars. - Tool inputs (
command,cwd) and tool outputs are first-class searchable fields;why.matchedFieldsreports only fields that actually matched.
New output
warnings(max 5),suggestions(max 3),nearMisses(max 3).coveragewithtotalSessionsKnown,sessionsDiscovered,sessionsEligible,sessionsSearched,messagesSearched,partsSearched,sessionsSkipped,skippedByReason,directoryBucketsSearched,directoryBucketCounts, and alimitedByarray (scope,time,directory,sessionsLimit,maxSessions,loadError,rankingBudget,timeBudget,abortSignal, etc.).SearchResultaddssource,why,directoryRelevance,titleMatch.
Live-MCP hardening
The plugin now defends at the execute boundary against MCP hosts that forward raw caller args without applying Zod schema defaults:
pickEnum/pickNumber/clampNumbercoerce, clamp, and whitelistscope,match,group,type,role,expand,explain,fallback,expandResults,window,width,results,sessions,expandBudgetMessages,expandBudgetChars.- Unknown enum values fall back with
Ignored {label}:"value"; using {label}:"fallback". - Out-of-range numerics clamp with warnings; non-numeric inputs render literally (
NaN,Infinity) rather than asnull.
Fixes
- No more literal
type:undefinedtext in suggestions whentypearrives unset. - Grammar:
Only N session(s) was/were searched.agrees with count.
Tests
- 60 → 70 Vitest tests. New
runToolRawhelper bypasses Zod parsing to exercise the defensive boundary directly. - New coverage: suggestion gating + grammar, defensive defaults, enum fallback, numeric clamping, non-numeric inputs, time-bound conflicts and impossible windows, malformed dates, expansion clamp warnings, directory bucket counts.
Docs
README.mdupdated for the new contract (unified search, time API, fallback,window:"auto", output additions).- New
docs/recall-search-ux-improvement-plan.md(P0/P1 marked implemented). docs/recall-tool-surface-plan.mdreconciled with implemented behavior.
Full changelog: v0.10.0...v0.11.0
v0.10.0
Bounded search expansion plus automated quality gates.
Features
recalladds bounded multi-message context expansion withexpand: "context"andexpandResults(default 1, max 3). Returns surrounding conversation around top hits, capped byMAX_EXPANDED_CONTEXT_MESSAGESandMAX_EXPANDED_TOTAL_TEXT_CHARSto prevent context blowups.- Expansion budgets enforced at response time: oversized expansions error with normalized totals so callers can retry with smaller
expandResults. - Compressed
recalltool-instruction payload to keep the schema description under the ~2k-2.2k token target.
Quality gates
- New automated CI gates:
format:check,lint,test:typecheck,test,typecheck,compileall run on every push. - Husky
pre-commit(scripts/precommit.sh) auto-fixes formatting/lint and re-runs full check;pre-push(scripts/prepush.sh) requires a clean tree and a clean check. - 60 behavior-focused Vitest tests covering recall ranking, fallback, expansion, and helpers.
Commits
a18c805feat(recall): add bounded search expansion546c4c0docs(tools): compress recall instructions84a1019ci: add automated quality gates5254c5atest(recall): add behavior coverage
Full changelog: v0.9.2...v0.10.0
v0.9.2
Bug fixes around session loading and timestamp filtering.
Fixes
- Surface session load failures (
29b15e1): when a session fails to load during scan, the failure is no longer silently swallowed. The session is counted insessionsSkippedwithloadErroras the reason and the response surfaces the error so callers can distinguish empty results from broken indexing. - Ignore zero-timestamp filters (#2,
44d9413):before: 0(and equivalentlyafter: 0) used to filter out every positive timestamp because0was treated as a meaningful epoch bound. Zero values are now ignored as no-ops, restoring expected behavior when callers default-initialize numeric filters.
Docs
- Clarified load-error metadata in the recall response shape.
Contributors
- @MatthewK30 (zero-timestamp fix, PR #2)
Commits
29b15e1fix(recall): surface session load failures44d9413fix(recall): ignore zero timestamp filtersb21a82edocs(recall): clarify load error metadata0f4b4b3Merge pull request #2 from MatthewK30/fix/ignore-zero-timestamp-filters
Full changelog: v0.9.1...v0.9.2
v0.9.1
Removes the hard ceiling on session scanning.
Behavior
- Remove session scan ceiling (
5e08f42): the previous fixed cap on the number of sessions scanned perrecallcall is removed. The default request now scans up to 1000 sessions, and the configuredmaxSessionsplugin option becomes the upper bound rather than an internal hard limit. This makes recall usable on archives with thousands of sessions without losing coverage to an arbitrary internal cap.
Callers can still pass an explicit sessions argument to keep individual queries cheap; the change only affects the implicit ceiling.
Full changelog: v0.9.0...v0.9.1
v0.9.0
Smart/fuzzy search across all scopes plus session grouping.
Features
- Smart/fuzzy ranking everywhere (
1316840): smart and fuzzy match modes (introduced for project scope in v0.8.0) now apply to global and session scopes as well. The same Fuse.js-based ranking that handled cross-project search now powers single-session and global queries, so match quality is consistent regardless ofscope. - Session grouping (
1316840): addedgroup: "session"to roll up multiple per-message hits into one result per session, ranked by best match within the session. The previousgroup: "part"(one result per matching message/part) remains the default. Useful when callers want a session-level overview rather than every individual hit.
Docs
- Rewrote the
READMEto document smart/fuzzy ranking, scope behavior, and grouping. - Added
CONTRIBUTING.mdwith the architecture/module map and contributor guidance. - Removed the now-implemented
SMART_RECALL_PLAN.mdplanning doc.
Commits
1316840feat(recall): add smart/fuzzy for all scopes and session grouping789190bdocs: rewrite README with smart/fuzzy docs, add CONTRIBUTING with architecture guidec843db4chore: remove SMART_RECALL_PLAN.md82fba93chore: remove scope and grouping plan
Full changelog: v0.8.0...v0.9.0
v0.8.0
Smart and fuzzy ranked search via Fuse.js.
Features
- Smart/fuzzy match modes (
5cabf23):recalladdsmatch: "smart" | "fuzzy"(alongside the existing literal substring match) backed by Fuse.js. Smart mode tokenizes the query, ranks by match quality with a small typo budget, and returns weighted scores; fuzzy mode is more permissive for spelling variations and partial recall. Literal stays the default and remains an exact-substring filter for predictable results. - Each ranked result returns ordering driven by Fuse score plus recency, so cross-session searches surface the strongest matches first instead of relying on insertion order.
Scope
- This release introduced smart/fuzzy ranking for project scope. Global and single-session scopes inherit the same ranking in v0.9.0.
Docs
- README emphasizes that recall searches across sessions and across projects (cross-project / cross-session is a first-class use case, not a sidecar).
Commits
5cabf23feat(recall): add smart/fuzzy search with Fuse.js rankingc3feb1adocs: emphasize cross-session, cross-project scope in README
Full changelog: v0.7.1...v0.8.0