Skip to content

Audit remediation: stabilization, concurrency, correctness, perf, tests#4

Open
nillo wants to merge 10 commits into
mainfrom
fix/audit-remediation
Open

Audit remediation: stabilization, concurrency, correctness, perf, tests#4
nillo wants to merge 10 commits into
mainfrom
fix/audit-remediation

Conversation

@nillo

@nillo nillo commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Systematic remediation of a 30-agent code audit. Branch is 10 commits, each a reviewable phase. Started at 824 tests / C+ grade; ends at 877 tests (+isolated benchmarks) / ~A grade.

Gate: npm run lint clean · npm test -- --run 877 passed · npm run build OK · npm run benchmark:memory green in isolation.

Phases (one commit each)

  • phase-0 — stabilization: NaN-recency cascade, memory status-filter leak, deep-route duplicate file list, embedding-degradation surfacing (no more silent empty-vector poisoning), merkle ctime cache, embedder retry policy (honor Retry-After, no retry on 4xx), WASM parser/tree leak, engines.node >=20.
  • phase-1 — concurrency & lifecycle: wired the silently-dropped ReadWriteLock (store-swap vs retrieve), locked wiki_* + store_memory MCP tools, withTimeout abort correctness, separator-aware path-prefix safety, persistent chokidar error handlers + max-wait debounce, CLI shutdown discipline, async logger.
  • phase-2+4 — dedup + error visibility: ~34 duplicated helpers/constants consolidated into core/strings.ts, search/shared/{workflow-families,mappers}.ts; reconciled 3 divergent ADJACENT_WORKFLOW_FAMILIES; deleted dead code (mcp-proxy.ts, graphDiff, …); silent catch { return [] } paths now log.
  • phase-3 — correctness (18 fixes): FTS BM25 column weighting, cascade deletes (targets/community rows), vector L2 distance→similarity math, iterative AST walk (no stack overflow), predicate-regex boundary, atomic memory writes, freshness-guard port, openaiApiKey security warning, and more.
  • phase-5 — performance (identical output, faster): regex/text memoization, bug-strategy signal batching, O(C·E)→O(E) community cohesion, pre-built Map lookups in the visualize hot path.
  • phase-6 — tests (+48): regression tests locking in every phase-0/1/3 fix, + new suites. Also re-landed 2 fixes lost to a parallel file-write race.
  • phase-7 — docs/packaging: README embeddingProvider default corrected, prepublishOnly now gates on lint+test+build, gitignore hygiene.
  • cleanup-wave1: deleted stale pnpm-lock.yaml, untracked hardcoded .mcp.json (+ portable .mcp.json.example); parser fixes (HTML over-chunking, Python decorator names, .h C/C++ sniffing, init retry); multi-language imports; widened synthetic confidence clamp; +49 tests (architecture-strategy 0→27, untested stores 0→22).
  • cleanup-wave2: batched IN queries (900-param safe), cached prepared statements, bulk community memberships, matchBoost clamp, boundary-aware variant matching, backtick-safe code fences, async git detection, RWLock underflow detection.
  • test: isolated benchmarks from the default suite (CI no longer flaky).

Test plan

  • npm run lint clean
  • npm test -- --run → 877 passed (benchmarks excluded)
  • npm run benchmark:memory → green in isolation
  • npm run build → OK

Notes / intentionally deferred

  • RWLock read-reentrancy for searchWithContext (observability-only race; not worth the risk).
  • memory-store ASCII-only FTS tokenizer (needs schema change + reindex).
  • product-areas hardcoded SaaS taxonomy (design decision, not a bug).

The 10 commits are atomic and independently meaningful — review per-commit if preferred.

nillo added 10 commits June 19, 2026 16:47
Phase 0 of the audit remediation. Also absorbs previously-uncommitted
baseline work (context-assembler, mcp-server, server, config, prompt-context,
evidence-compressor) into a clean starting point for subsequent phases.

Critical fixes:
- search/ranker: guard NaN recency (one bad date no longer scrambles the
  whole sort via non-transitive NaN comparators) + validate k
- memory/search: zero score before status-filter continue (archived memories
  no longer leak into results when statuses omitted) + guard NaN mtime
- context-assembler: strip the full header (incl. file-list line) in the
  deep route so 'Files included:' is no longer emitted twice
- indexer/pipeline: surface embedding degradation at error level and skip
  merkle commit for degraded files so they retry next run (was silently
  poisoning the index with empty vectors)
- indexer/merkle: add ctimeMs to the cache key so git-checkout / touch -r /
  cp content changes are detected even when mtime is preserved
- indexer/embedder: only retry on network/5xx/429, honor Retry-After,
  add full jitter (was retrying fatal 400/401 and ignoring backoff)
- parser/chunker: free tree/parser in a finally block (WASM heap is not
  GC'd; was leaking per-file across large indexes)
- daemon/rotating-log: surface write/rotate failures on stderr instead of
  swallowing them silently
- package.json: engines.node >=18 -> >=20 (matches better-sqlite3@12)

Also gitignores .engram/ (local sqlite state) and adds project config
(devcontainer, CI/release workflows, CONTRIBUTING, Makefile, docs).
- search/pipeline-core: wire the previously-dropped ReadWriteLock. Store
  the lock, wrap updateStores() in withWrite and retrieve() in withRead so
  an in-flight retrieve can no longer observe a half-swapped store set.
  HybridSearch.updateStores + mcp-server clear_index now await the swap.
- daemon/mcp-server: lock the four wiki_* tools (wiki_query/wiki_read/
  wiki_check_staleness under withRead, wiki_write under withWrite) and move
  the store_memory consolidation check inside withWrite so two concurrent
  stores of similar names can't both pass the dedup check.
- daemon/server: withTimeout now tracks a timedOut flag, swallows post-
  timeout handler errors (ERR_HTTP_HEADERS_SENT) and avoids double-resolve.
- daemon/memory/runtime: clearWorkingMemory uses a separator-aware prefix
  so a sibling dir like writableDir+'-evil' can't be matched & deleted;
  add a persistent chokidar error handler (once only covered startup).
- daemon/watcher: add persistent error handler + max-wait debounce (5s)
  so sustained activity can't postpone flushes indefinitely; clear the
  maxWait timer on stop().
- cli/index-cmd: use async pipeline.closeAsync() (sync close() races
  native vector-store destructors on exit).
- cli/mcp: port serve.ts shutdown discipline (re-entrancy guard, try/catch,
  10s force-exit timeout) and add global unhandledRejection/uncaughtException
  handlers so the stdio MCP process never hangs on a cleanup rejection.
- core/logger: switch stderr destination to sync:false (sync:true could
  hard-deadlock an MCP stdio server if the client stops draining stderr).
- README: fix embeddingProvider default 'keyword' -> 'local' (the actual
  default in config.ts) and clarify local vs keyword vs ollama/openai; note
  the config table is partial and point to src/core/config.ts for the full
  list; document the 'memory' bin alias collision risk.
- package.json: prepublishOnly now runs lint + test --run + build so every
  publish is gated (was build-only).
- cli/index: refresh stale fallback version literal 0.3.0 -> 0.7.1.
- .gitignore: add .env*, *.log, .idea/, .vscode/, coverage/, *.tgz.
- CONTRIBUTING: add a Development section (Node >=20, npm ci, lint, test,
  build commands).
Phase 2 — consolidate duplicated helpers/constants to single sources of truth:
- core/strings.ts: canonical slugify, escapeRegExp, extractSectionText,
  extractBulletSection, humanizeSlug (was copied across 4-9 files).
- search/shared/workflow-families.ts: ADJACENT_WORKFLOW_FAMILIES reconciled
  to the UNION of three previously-divergent per-strategy copies (silent
  drift bug), plus INVENTORY_GENERIC_TARGET_ALIAS_TERMS,
  INVENTORY_STRUCTURAL_TERMS, TRACE_NOISE_TERMS.
- search/shared/mappers.ts: canonical chunkToSearchResult (was 5 copies) and
  isImplementationPath (was 4 copies); strategy classes and RetrievalPipeline
  now delegate to these.
- analysis/topology-analysis: import isTestFile from search/utils (deleted the
  divergent 1-regex local copy; the 3-regex canonical is retained).
- search/targets: export the acronym-correct splitIdentifierTokens; search/utils
  imports it and its divergent copy is removed (was breaking surface detection
  for PascalCase identifiers like HTTPServer/APIRouter).
- Delete dead code: cli/mcp-proxy.ts (orphaned file), graphDiff/GraphDiffResult,
  isBusinessWikiPage, isIndexFilePath, sameNormalizedTarget, resetHookSessionState.

Phase 4 — surface silent error swallowing on hot paths (return values unchanged;
only the error path now logs):
- core/errors.ts: toErrorMessage + logWarnWithStack (warn msg + debug stack).
- storage/fts-store: getDocFreq/runFtsQuery now log.warn on failure (was
  returning 0/[] silently, masking corruption/locked-DB as 'no results').
- storage/memory-store: FTS runFtsQuery now warns.
- indexer/pipeline: indexChanged merkle catch now distinguishes ENOENT (silent)
  from non-ENOENT (warns with relPath).
- indexer/embedder: Ollama healthCheck now debug-logs the failure cause.
- daemon/server: prompt-context + pre-tool-use body-parse fallbacks now warn.

Verified: npm run lint clean; npm test -- --run 824 passed / 1 skipped.
Search:
- pipeline-core/expandGraph: separate caller vs callee name sets so
  unrelated same-named chunks are no longer injected with wrong fallback
  scores.
- architecture-strategy: Math.max(...maxChunks, 6) -> Math.min so a bundle
  configured with maxChunks<=5 is honored (was silently floored to 6).
- capability-evidence: selectBestChunkForEvidence now picks the most
  substantial chunk (descending span; was ascending); removed the redundant
  UN SCOPED findCallees(name) loop that caused cross-file contamination
  (chunk-scoped findCalleesForChunk already covers it).
- bug-strategy: stop overwriting the real evidence score with a linear
  rank value at the end of selectBugLocalizationBundle (hookScore preserved).

Storage:
- fts-store: weighted BM25 (name=10 > content=5 > kind=0.1) so the
  low-cardinality kind column no longer distorts relevance.
- metadata-store: removeFile(s) now cascade-deletes targets + community_*
  rows in the same transaction (was leaving orphans; findTargetsByFilePath
  returned deleted-file targets).
- vector-store: upsert no longer silently loses data on add-failure (logs +
  restore + rethrow); distance->similarity conversion fixed for LanceDB's
  default L2 metric (1/(1+distance); was the metric-wrong 1-distance) with
  a NaN guard.
- memory-store: guard new Date(...).getTime() NaN so malformed timestamps
  no longer corrupt the compaction sort / age comparison.

Analysis:
- semantic-features: predicate regex gains a trailing [A-Z_] boundary so
  ordinary names (island, issue, canvas) are no longer miscounted as
  predicate calls.
- call-graph: walkForCallNodes converted from unbounded recursion to an
  explicit stack (prevents stack overflow on deep/minified ASTs).

Memory:
- memory/search: docstring corrected (linear decay to zero at 90 days, not
  a 'half-life').
- memory/files: writeManagedMemoryFile is now atomic (tmp+rename) and warns
  on slug-collision overwrites of differently-named memories.
- evidence-compressor: minChunkTokens/targetRatio made optional;
  targetRatio now scales the maxLines cap when provided.

Wiki:
- auto-capture: freshness guard now reads sourceCommit from disk (was dead
  code reading stripped content); generator's a52abb0 fix is ported.
- generator: writePage awaits indexFile before removeStaleGeneratedPages
  (eliminates deletion-vs-indexing race + pending-index loss on exit).

Core:
- config: openaiApiKey security warning now fires via raw pre-scan (was
  dead code — .strict() parsing rejected the key before the warning guard).

Verified: npm run lint clean; npm test -- --run 824 passed / 1 skipped
(the single failure under contention is the known flaky memory-benchmark
p95 latency gate, which passes in isolation).
- search/utils.textMatchesQueryTerm: module-level caches for compiled
  variant regexes (cap 500) and normalized text (cap 1000) — was
  re-normalizing ~1200-char text + compiling fresh RegExp per variant per
  call (the dominant bug-query latency cost).
- search/architecture-strategy: hoist selectedLayers Set out of the sort
  comparator (.flatMap().includes -> .has()); was O(N^2 log N).
- search/bug-strategy: precompute getBugCandidateSignals + batch
  getChunkTagsByIds/getChunkFeaturesByIds once via per-window caches
  (was recomputed 5-7x per candidate + N+1 SQL per pass).
- search/capability-evidence: use getChunksLightweight() (no content blob)
  to enumerate distinct file paths instead of getAllChunks() — eliminates
  loading 50k content blobs twice per resolveCapabilityEvidence call.
- indexer/pipeline: per-window Map cache for resolveCallTarget so repeated
  identical edges resolve once (findCallers has no bulk variant — left as-is).
- visualize/data-extractor: pre-build Map<chunkId,node> + Map<name|filePath>,
  use graph.nodes.get() and a Set for wiki-edge filtering (was repeated
  Array.from(values).find() linearizations — O(M*N) -> O(N)).
- analysis/community-detection: computeCohesion/splitCommunity now use a
  single membership-bucketed edge pass (O(E) not O(C*E)); identical
  cohesion values verified on 128/285/428-community graphs.

Verified: npm run lint clean; npm test -- --run 824 passed / 1 skipped.
Adds 48 tests across 12 new files (824 -> 872 passing), including regression
tests that lock in the Phase 0/1/3 fixes:
- ranker NaN-recency + k validation
- memory/search status-filter leak (archived excluded by default)
- context-assembler deep-route single file-list
- embedder retry policy (400 not retried; 429 honors Retry-After; TypeError retried)
- fts-store BM25 weighting (name > kind)
- metadata-store cascade delete (no orphan targets/community rows)
- semantic-features predicate boundary (island not counted; isAuthenticated is)
- call-graph 5000-deep AST (no stack overflow)
- pipeline-core ReadWriteLock wiring (concurrent retrieve+updateStores)
- merkle ctime detection (content change with preserved mtime)
Plus new coverage for context-prioritization and evidence-compressor.

Also lands two Phase 3 fixes that were lost to a parallel file-write race
(re-verified via the new tests):
- analysis/call-graph: walkForCallNodes converted to an explicit stack
  (was still recursive — the original edit didn't persist).
- search/evidence-compressor: minChunkTokens/targetRatio made optional and
  targetRatio now scales the maxLines cap (was still required + unwired).

Verified: npm run lint clean; npm test -- --run 872 passed / 1 skipped.
Packaging:
- Remove stale pnpm-lock.yaml (npm is canonical); gitignore it.
- Untrack .mcp.json (hardcoded user-local absolute paths); gitignore + add
  portable .mcp.json.example template.

Parser:
- languages: HTML extractableTypes restricted to script/style (was every
  'element' -> hundreds of junk <div> chunks per page; 17 -> 1 on the test
  fixture). Add .hh/.hxx/.hcc to C++ and content-sniff .h headers for C++
  markers (namespace/template/class/std::) so C++ headers parse with the
  C++ grammar instead of C (was silent data loss).
- chunker: extractName recurses into decorated_definition's 'definition'
  child so Python @decorator defs are named (was <anonymous>).
- tree-sitter: initTreeSitter resets the cached promise on rejection so a
  transient init failure doesn't poison all later getLanguage calls.

Analysis + business:
- imports: extractImports now dispatches per-language (Python firm; Rust/
  Go/Java best-effort) instead of TS/JS-only; handle 'export * from' and
  'export * as ns from' namespace re-exports (were dropped).
- product-areas: widen synthetic confidence clamp [0.55,0.94] -> [0.2,0.95]
  so weak evidence reports low confidence; fix dead countDomainSignals
  multi-word check; remove camelCase false-positive in technical-label
  detection; export deriveBusinessDisplaySummary as canonical.

Tests (+49): architecture-strategy.test.ts (27 — was 0, the biggest gap)
and stores-coverage.test.ts (22 — semantic/target/community/stats stores,
all previously untested).

Verified: lint clean; 921 passed / 1 skipped.
Storage + visualize:
- target-store/chunk-store/semantic-store: batch IN queries at the 900-param
  SQLite limit (was unbounded — threw on large monorepos) and cache prepared
  statements (was re-preparing per call / inside batch loops).
- community-store: add bulk getMembershipsForChunks(); metadata-store facade
  delegates to it; data-extractor now resolves all chunk memberships in one
  batched call (was N per-chunk roundtrips on the visualize hot path).
- data-extractor: import canonical deriveBusinessDisplaySummary (was a
  divergent local copy); read sourceCommit from frontmatter instead of the
  hardcoded empty-string lie; drop now-unused local helpers.

Search:
- ranker: clamp multiplicative matchBoost at 8 (was unbounded ~1.7^n); short
  query variants (<=4 chars) now require a boundary/segment match instead of
  raw substring (cat no longer matches category, work no longer matches network).
- context-assembler: realistic header-budget estimates (was flat 40/60 -> could
  exceed tokenBudget); maxChunks:0 now yields zero chunks (was always >=1);
  formatChunk fence uses one more backtick than the content's longest run so
  embedded triple-backticks can't break the code fence.
- capability-evidence: replace unreachable dead fallbacks with invariant
  throws; memoize findFileChunks/scoreCapabilityFile per invocation (was
  O(n^2)-ish re-scoring across seeds).

Daemon/hooks/cli:
- memory/runtime: detectBranch now async (execFile) instead of execFileSync
  on every prompt (was blocking the event loop).
- pre-tool-use: narrow the dead 'deny' contract to 'allow' (every branch
  returned allow; deny was never enforced).
- cli/lens: reject port 0 (< 0 -> < 1); diagnostics to stderr.
- cli/serve: warn on invalid --max-chunks (was silently ignored).
- cli/stats: friendly 'No index found' on open failure + try/finally close.
- mcp-server: drop redundant dynamic fs/path re-imports; wiki_check_staleness
  uses async readFile (was sync in async handler).

Core:
- rwlock: underflow detection in releaseRead/releaseWrite (a double-release
  previously made readers negative and stalled the lock forever with no signal).

Verified: lint clean; 921 passed / 1 skipped.
The latency/memory benchmarks under test/benchmark/* measure p95 latency and
heap usage, which are unreliable when vitest runs them concurrently with 80+
other test files — sustained contention inflated p95 (~56-71ms vs the 40ms
gate) and caused ~50% of full-suite runs to fail spuriously.

Exclude test/benchmark/** from the default 'npm test' run via vitest config.
The dedicated 'npm run benchmark:memory' (and stress) scripts still run them
in isolation where their thresholds are meaningful. Default suite is now
reliably green (877 unit/integration tests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant