Feat/eng 1753 by ngduyanhece · Pull Request #306 · campfirein/byterover-cli

ngduyanhece · 2026-04-03T04:39:04Z

Summary

Problem:
Why it matters:
What changed:
What did NOT change (scope boundary):

Type of change

Scope (select all touched areas)

Linked issues

Closes #
Related #

Root cause (bug fixes only, otherwise write `N/A`)

Root cause:
Why this was not caught earlier:

Test plan

Coverage added:
- Unit test
- Integration test
- Manual verification only
Test file(s):
Key scenario(s) covered:

User-visible changes

List user-visible changes (including defaults, config, or CLI output).
If none, write None.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording

Checklist

Tests added or updated and passing (npm test)
Lint passes (npm run lint)
Type check passes (npm run typecheck)
Build succeeds (npm run build)
Commits follow Conventional Commits format
Documentation updated (if applicable)
No breaking changes (or clearly documented above)
Branch is up to date with main

Risks and mitigations

List real risks for this PR. If none, write None.

Risk:
- Mitigation:

Close the feedback loop between performance tracking and memory retrieval. Entries that correlated with better task outcomes now rank higher in search, and synthesis reflects on which patterns associate with better performance. Ship 1 — insightsActive tracking: - SessionInsightsTracker records which entries the curate agent consulted via tools.searchKnowledge() during each curation session - Session-scoped (parentSessionId), curate-only (query sessions excluded) - canonicalPath field added to search results in all return branches using getSummaryAccessPath resolution for consistent keys - CurateExecutor drains tracker on success, clears in finally - NormalizedPerformanceLogEntry type ensures insightsActive: string[] at the domain boundary; readPerformanceLog normalizes missing values Ship 2 — Performance-weighted retrieval boost: - performance-correlation.ts: pure functions computing per-entry and per-domain factors from insightsActive correlation data - Factors bounded to ±15% via tanh saturation (BM25 still dominates) - Applied in runTextSearch and propagateScoresToParents - Cached with stat-only mtime check (no JSONL parse on cache hit) Ship 3 — Performance-aware synthesis: - Synthesis prompt updated with 5th instruction about performance - buildPerformanceContext injects domain stats, high/low-performing entries into the synthesis user message - Gracefully degrades when < 3 relevant performance entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-03T04:39:19Z

Claude finished @ngduyanhece's task in 3m 22s —— View job

PR Review

Gather context and read changed files
Analyze code for bugs, security issues, performance problems
Post inline comments and final review summary

Summary

This PR implements a performance-memory correlation system (Ship 1–3): tracking which knowledge entries were surfaced during curation, correlating them with task performance scores, and feeding those signals back into search ranking and synthesis prompts. The architecture is clean and the fail-open design throughout is appropriate.

9 inline comments posted covering correctness, robustness, style, and test gaps. Here's a prioritized overview:

Issues worth addressing

Severity	Location	Summary
`issue`	`performance-correlation.ts:108`	`domainAvg` is built from insight-filtered entries only, biasing the baseline — domain fallback `0.5` used for domains without insights may skew deltas
`issue`	`experience-synthesis-service.ts:179`	Substring match `p.includes(\`/${subfolder}/`)`can false-positive on paths like`"dead-end-lessons"`when subfolder is`"lessons"`
`issue`	`search-knowledge-service.ts:1051`	`mtimeMs ===` comparison — mtime floats are fine on most filesystems but a `>=` guard is safer for append-only files
`issue`	`performance-correlation.test.ts:27`	Missing boundary test: no coverage for exactly 5 entries having `insightsActive` (the activation threshold)

Nitpicks / non-blocking

performance-correlation.ts:67 — "domain" in computeDomainFactors is a path-derived directory name, while entry.domain is a task-level LLM field; they'll never share keys. A clarifying comment prevents future confusion.
experience-synthesis-service.ts:191 — firstHalf.length || 1 guard is unreachable given the relevant.length < 3 guard above it.
tools-sdk.ts:260 — Inline import(...) type for SessionInsightsTracker is inconsistent with top-level import style.
search-knowledge-service.ts:1056 — ExperienceStore re-allocated on every cache miss; could be a class field.
sandbox-service.ts:11 — Missing space inside braces on {CurateResultCollector}.

What's well done

SessionInsightsTracker is minimal and correct — Set-based dedup, session-scoped isolation, drain-vs-clear separation is clean.
The drainSession-before-hook ordering in CurateExecutor is correct: the snapshot is captured before the experience entries are written, preventing circular self-reference in performance correlation.
getPerformanceFactors mtime-based caching avoids reading the JSONL on every search — good for performance.
tanh saturation bounding factors to ±0.15 prevents runaway reinforcement while keeping BM25 dominant. Well reasoned.
Fail-open behavior is consistently maintained throughout the new code paths.

src/server/core/domain/knowledge/performance-correlation.ts

src/agent/infra/tools/implementations/search-knowledge-service.ts

src/server/infra/context-tree/experience-synthesis-service.ts

src/server/infra/executor/curate-executor.ts

src/agent/infra/sandbox/tools-sdk.ts

src/agent/infra/tools/implementations/search-knowledge-service.ts

test/unit/server/core/domain/knowledge/performance-correlation.test.ts

src/server/infra/context-tree/experience-synthesis-service.ts

github-actions · 2026-04-03T05:01:02Z

Claude finished @ngduyanhece's task in 4m 12s —— View job

PR Review

Gather context and read changed files
Analyze code for bugs, security issues, performance problems
Post inline comments and final review summary

Summary

All 9 issues from the previous review round were addressed — the domain-baseline bias fix, substring match replacement, mtime >= guard, boundary test, naming comment, unreachable || 1 guard, import style, ExperienceStore class field, and brace spacing are all confirmed resolved.

5 new inline comments posted on this pass:

Severity	Location	Summary
`issue`	`experience-synthesis-service.test.ts:257`	Negative-only coverage for `buildPerformanceContext` — no test that the `Performance Context` section is injected when data is present
`nitpick`	`cipher-agent.ts:695`	Inline dynamic `import(...)` type reintroduced for `SessionInsightsTracker` — same pattern fixed in `tools-sdk.ts` last round
`nitpick`	`experience-synthesis-service.ts:203`	`entry.score > domainAvg` classifies exact-mean entries as "low performing"; `>=` is more defensible
`nitpick`	`experience-synthesis-service.ts:192`	Unequal halves in trend calc for odd-length relevant log — `firstHalf` is one element shorter than `secondHalf`; a comment clarifies intent
`nitpick`	`search-knowledge-service.ts:1329`	`trySymbolicSearch` is async solely to call `getPerformanceFactors()` internally; the design trade-off (avoiding eager computation on direct-match paths) is valid but deserves a comment

What's well done in this revision

All four prior correctness issues resolved cleanly. The buildDomainAverages(log) full-log baseline fix is the most impactful of these.
path.split('/').includes(subfolder) is a clean, correct replacement for the substring false-positive.
The boundary test (activates once exactly five entries have non-empty insightsActive) is well structured.
The new matchesSubfolderPath negative test correctly exercises the regression that was being guarded.
experienceStore as a class field and the top-level SessionInsightsTracker import in tools-sdk.ts are clean wins.

github-actions · 2026-04-03T05:04:29Z

src/agent/infra/agent/cipher-agent.ts

+   * Set the session insights tracker for performance-memory correlation.
+   * Forwarded to sandboxService so ToolsSDK.searchKnowledge() records surfaced paths.
+   */
+  public setInsightsTracker(tracker: import('../../../server/infra/context-tree/session-insights-tracker.js').SessionInsightsTracker): void {


nitpick (import style): Same inline dynamic import(...) type pattern that was flagged and fixed in tools-sdk.ts in the previous round — now it's reintroduced here. Consider lifting it to a top-level import to keep the file consistent with the rest of the imports:

Suggested change

public setInsightsTracker(tracker: import('../../../server/infra/context-tree/session-insights-tracker.js').SessionInsightsTracker): void {

public setInsightsTracker(tracker: SessionInsightsTracker): void {

And add at the top of the file alongside other import type declarations:

import type { SessionInsightsTracker } from '../../../server/infra/context-tree/session-insights-tracker.js'

github-actions · 2026-04-03T05:04:36Z

src/server/infra/context-tree/experience-synthesis-service.ts

+  const lowEntries = new Map<string, number>()
+
+  for (const entry of relevant) {
+    const bucket = entry.score > domainAvg ? highEntries : lowEntries


nitpick (off-by-one boundary): Entries with score === domainAvg are classified as low-performing (lowEntries) due to the strict > comparison. In practice, a score exactly at the mean is neither high nor low. This won't cause incorrect output but could surface neutrally-correlated entries as "frequently active in lower-scoring tasks," which could mislead the LLM synthesis.

A simple fix is to use a small threshold instead of the mean as the boundary:

Suggested change

const bucket = entry.score > domainAvg ? highEntries : lowEntries

const bucket = entry.score >= domainAvg ? highEntries : lowEntries

Or alternatively, skip entries whose score is within a small epsilon of the mean (Math.abs(entry.score - domainAvg) < 0.05) — though the current approach is acceptable for a prompting hint.

github-actions · 2026-04-03T05:04:45Z

src/server/infra/context-tree/experience-synthesis-service.ts

+  const mid = Math.floor(scores.length / 2)
+  const firstHalf = scores.slice(0, mid)
+  const secondHalf = scores.slice(mid)
+  const avgFirst = firstHalf.reduce((a, b) => a + b, 0) / firstHalf.length


nitpick (unequal halves for odd counts): When relevant.length is odd, mid = Math.floor(N/2) gives firstHalf one fewer element than secondHalf. For example, with exactly 3 entries: firstHalf = [s0], secondHalf = [s1, s2]. The most recent score gets 2× weight in the "second half" average, which can produce a spurious trend signal for small samples.

This only affects synthesis prompts (not hard decisions), but a note in the code would prevent future confusion, e.g.:

// Note: for odd-length arrays firstHalf is 1 element shorter than secondHalf — // acceptable bias for a heuristic prompt hint.

Low priority given the fail-open nature of the context injection.

github-actions · 2026-04-03T05:04:53Z

src/agent/infra/tools/implementations/search-knowledge-service.ts

    if (textPart) {
      // Scoped search: search text within the matched subtree
-      return this.runTextSearch(textPart, documentMap, index, limit, topMatch.path, symbolTree, referenceIndex, summaryMap, options)
+      const perfFactors = await this.getPerformanceFactors()


nitpick (design inconsistency): trySymbolicSearch became async solely to call getPerformanceFactors() in this one branch. The outer search() already computes perfFactors for the text-search fallback path — but those two calls cannot be trivially unified because trySymbolicSearch short-circuits before perfFactors is computed at the outer level.

The cached stat call here is fast, so this is not a performance issue. However, adding a brief comment explains why the method is async and why perfFactors isn't passed as a parameter:

// Fetch performance factors here rather than receiving them as a parameter: // the outer search() path computes perfFactors *after* this method returns null, // so passing them in would require eagerly computing them even for direct-match exits. const perfFactors = await this.getPerformanceFactors()

github-actions · 2026-04-03T05:05:04Z

test/unit/infra/context-tree/experience-synthesis-service.test.ts

+        await service.synthesize(store, 5)
+
+        const userMessage = (llm.generate as sinon.SinonStub).firstCall.args[1] as string
+        expect(userMessage).to.not.include('**Performance Context**')


issue (test gap — negative test only): The new test correctly verifies that dead-end-lessons paths do NOT match the lessons subfolder. But there's no complementary test confirming that the **Performance Context** section IS injected into the user message when sufficient relevant performance data is present.

Without a positive-path test, a future regression that silently drops the performance context section (e.g., a guard condition change) would go undetected. Fix this →

ngduyanhece and others added 3 commits March 31, 2026 16:52

Address PR review feedback on search correlation

a74d0f0

Address remaining PR review threads

dbbcee1