Audit remediation: stabilization, concurrency, correctness, perf, tests by nillo · Pull Request #4 · proofofwork-agency/reporecall

nillo · 2026-06-19T17:57:39Z

Summary

Systematic remediation of a 30-agent code audit. Branch is 10 commits, each a reviewable phase. Started at 824 tests / C+ grade; ends at 877 tests (+isolated benchmarks) / ~A grade.

Gate: npm run lint clean · npm test -- --run 877 passed · npm run build OK · npm run benchmark:memory green in isolation.

Phases (one commit each)

phase-0 — stabilization: NaN-recency cascade, memory status-filter leak, deep-route duplicate file list, embedding-degradation surfacing (no more silent empty-vector poisoning), merkle ctime cache, embedder retry policy (honor Retry-After, no retry on 4xx), WASM parser/tree leak, engines.node >=20.
phase-1 — concurrency & lifecycle: wired the silently-dropped ReadWriteLock (store-swap vs retrieve), locked wiki_* + store_memory MCP tools, withTimeout abort correctness, separator-aware path-prefix safety, persistent chokidar error handlers + max-wait debounce, CLI shutdown discipline, async logger.
phase-2+4 — dedup + error visibility: ~34 duplicated helpers/constants consolidated into core/strings.ts, search/shared/{workflow-families,mappers}.ts; reconciled 3 divergent ADJACENT_WORKFLOW_FAMILIES; deleted dead code (mcp-proxy.ts, graphDiff, …); silent catch { return [] } paths now log.
phase-3 — correctness (18 fixes): FTS BM25 column weighting, cascade deletes (targets/community rows), vector L2 distance→similarity math, iterative AST walk (no stack overflow), predicate-regex boundary, atomic memory writes, freshness-guard port, openaiApiKey security warning, and more.
phase-5 — performance (identical output, faster): regex/text memoization, bug-strategy signal batching, O(C·E)→O(E) community cohesion, pre-built Map lookups in the visualize hot path.
phase-6 — tests (+48): regression tests locking in every phase-0/1/3 fix, + new suites. Also re-landed 2 fixes lost to a parallel file-write race.
phase-7 — docs/packaging: README embeddingProvider default corrected, prepublishOnly now gates on lint+test+build, gitignore hygiene.
cleanup-wave1: deleted stale pnpm-lock.yaml, untracked hardcoded .mcp.json (+ portable .mcp.json.example); parser fixes (HTML over-chunking, Python decorator names, .h C/C++ sniffing, init retry); multi-language imports; widened synthetic confidence clamp; +49 tests (architecture-strategy 0→27, untested stores 0→22).
cleanup-wave2: batched IN queries (900-param safe), cached prepared statements, bulk community memberships, matchBoost clamp, boundary-aware variant matching, backtick-safe code fences, async git detection, RWLock underflow detection.
test: isolated benchmarks from the default suite (CI no longer flaky).

Test plan

npm run lint clean
npm test -- --run → 877 passed (benchmarks excluded)
npm run benchmark:memory → green in isolation
npm run build → OK

Notes / intentionally deferred

RWLock read-reentrancy for searchWithContext (observability-only race; not worth the risk).
memory-store ASCII-only FTS tokenizer (needs schema change + reindex).
product-areas hardcoded SaaS taxonomy (design decision, not a bug).

The 10 commits are atomic and independently meaningful — review per-commit if preferred.

Phase 0 of the audit remediation. Also absorbs previously-uncommitted baseline work (context-assembler, mcp-server, server, config, prompt-context, evidence-compressor) into a clean starting point for subsequent phases. Critical fixes: - search/ranker: guard NaN recency (one bad date no longer scrambles the whole sort via non-transitive NaN comparators) + validate k - memory/search: zero score before status-filter continue (archived memories no longer leak into results when statuses omitted) + guard NaN mtime - context-assembler: strip the full header (incl. file-list line) in the deep route so 'Files included:' is no longer emitted twice - indexer/pipeline: surface embedding degradation at error level and skip merkle commit for degraded files so they retry next run (was silently poisoning the index with empty vectors) - indexer/merkle: add ctimeMs to the cache key so git-checkout / touch -r / cp content changes are detected even when mtime is preserved - indexer/embedder: only retry on network/5xx/429, honor Retry-After, add full jitter (was retrying fatal 400/401 and ignoring backoff) - parser/chunker: free tree/parser in a finally block (WASM heap is not GC'd; was leaking per-file across large indexes) - daemon/rotating-log: surface write/rotate failures on stderr instead of swallowing them silently - package.json: engines.node >=18 -> >=20 (matches better-sqlite3@12) Also gitignores .engram/ (local sqlite state) and adds project config (devcontainer, CI/release workflows, CONTRIBUTING, Makefile, docs).

- search/pipeline-core: wire the previously-dropped ReadWriteLock. Store the lock, wrap updateStores() in withWrite and retrieve() in withRead so an in-flight retrieve can no longer observe a half-swapped store set. HybridSearch.updateStores + mcp-server clear_index now await the swap. - daemon/mcp-server: lock the four wiki_* tools (wiki_query/wiki_read/ wiki_check_staleness under withRead, wiki_write under withWrite) and move the store_memory consolidation check inside withWrite so two concurrent stores of similar names can't both pass the dedup check. - daemon/server: withTimeout now tracks a timedOut flag, swallows post- timeout handler errors (ERR_HTTP_HEADERS_SENT) and avoids double-resolve. - daemon/memory/runtime: clearWorkingMemory uses a separator-aware prefix so a sibling dir like writableDir+'-evil' can't be matched & deleted; add a persistent chokidar error handler (once only covered startup). - daemon/watcher: add persistent error handler + max-wait debounce (5s) so sustained activity can't postpone flushes indefinitely; clear the maxWait timer on stop(). - cli/index-cmd: use async pipeline.closeAsync() (sync close() races native vector-store destructors on exit). - cli/mcp: port serve.ts shutdown discipline (re-entrancy guard, try/catch, 10s force-exit timeout) and add global unhandledRejection/uncaughtException handlers so the stdio MCP process never hangs on a cleanup rejection. - core/logger: switch stderr destination to sync:false (sync:true could hard-deadlock an MCP stdio server if the client stops draining stderr).

- README: fix embeddingProvider default 'keyword' -> 'local' (the actual default in config.ts) and clarify local vs keyword vs ollama/openai; note the config table is partial and point to src/core/config.ts for the full list; document the 'memory' bin alias collision risk. - package.json: prepublishOnly now runs lint + test --run + build so every publish is gated (was build-only). - cli/index: refresh stale fallback version literal 0.3.0 -> 0.7.1. - .gitignore: add .env*, *.log, .idea/, .vscode/, coverage/, *.tgz. - CONTRIBUTING: add a Development section (Node >=20, npm ci, lint, test, build commands).

Phase 2 — consolidate duplicated helpers/constants to single sources of truth: - core/strings.ts: canonical slugify, escapeRegExp, extractSectionText, extractBulletSection, humanizeSlug (was copied across 4-9 files). - search/shared/workflow-families.ts: ADJACENT_WORKFLOW_FAMILIES reconciled to the UNION of three previously-divergent per-strategy copies (silent drift bug), plus INVENTORY_GENERIC_TARGET_ALIAS_TERMS, INVENTORY_STRUCTURAL_TERMS, TRACE_NOISE_TERMS. - search/shared/mappers.ts: canonical chunkToSearchResult (was 5 copies) and isImplementationPath (was 4 copies); strategy classes and RetrievalPipeline now delegate to these. - analysis/topology-analysis: import isTestFile from search/utils (deleted the divergent 1-regex local copy; the 3-regex canonical is retained). - search/targets: export the acronym-correct splitIdentifierTokens; search/utils imports it and its divergent copy is removed (was breaking surface detection for PascalCase identifiers like HTTPServer/APIRouter). - Delete dead code: cli/mcp-proxy.ts (orphaned file), graphDiff/GraphDiffResult, isBusinessWikiPage, isIndexFilePath, sameNormalizedTarget, resetHookSessionState. Phase 4 — surface silent error swallowing on hot paths (return values unchanged; only the error path now logs): - core/errors.ts: toErrorMessage + logWarnWithStack (warn msg + debug stack). - storage/fts-store: getDocFreq/runFtsQuery now log.warn on failure (was returning 0/[] silently, masking corruption/locked-DB as 'no results'). - storage/memory-store: FTS runFtsQuery now warns. - indexer/pipeline: indexChanged merkle catch now distinguishes ENOENT (silent) from non-ENOENT (warns with relPath). - indexer/embedder: Ollama healthCheck now debug-logs the failure cause. - daemon/server: prompt-context + pre-tool-use body-parse fallbacks now warn. Verified: npm run lint clean; npm test -- --run 824 passed / 1 skipped.

Search: - pipeline-core/expandGraph: separate caller vs callee name sets so unrelated same-named chunks are no longer injected with wrong fallback scores. - architecture-strategy: Math.max(...maxChunks, 6) -> Math.min so a bundle configured with maxChunks<=5 is honored (was silently floored to 6). - capability-evidence: selectBestChunkForEvidence now picks the most substantial chunk (descending span; was ascending); removed the redundant UN SCOPED findCallees(name) loop that caused cross-file contamination (chunk-scoped findCalleesForChunk already covers it). - bug-strategy: stop overwriting the real evidence score with a linear rank value at the end of selectBugLocalizationBundle (hookScore preserved). Storage: - fts-store: weighted BM25 (name=10 > content=5 > kind=0.1) so the low-cardinality kind column no longer distorts relevance. - metadata-store: removeFile(s) now cascade-deletes targets + community_* rows in the same transaction (was leaving orphans; findTargetsByFilePath returned deleted-file targets). - vector-store: upsert no longer silently loses data on add-failure (logs + restore + rethrow); distance->similarity conversion fixed for LanceDB's default L2 metric (1/(1+distance); was the metric-wrong 1-distance) with a NaN guard. - memory-store: guard new Date(...).getTime() NaN so malformed timestamps no longer corrupt the compaction sort / age comparison. Analysis: - semantic-features: predicate regex gains a trailing [A-Z_] boundary so ordinary names (island, issue, canvas) are no longer miscounted as predicate calls. - call-graph: walkForCallNodes converted from unbounded recursion to an explicit stack (prevents stack overflow on deep/minified ASTs). Memory: - memory/search: docstring corrected (linear decay to zero at 90 days, not a 'half-life'). - memory/files: writeManagedMemoryFile is now atomic (tmp+rename) and warns on slug-collision overwrites of differently-named memories. - evidence-compressor: minChunkTokens/targetRatio made optional; targetRatio now scales the maxLines cap when provided. Wiki: - auto-capture: freshness guard now reads sourceCommit from disk (was dead code reading stripped content); generator's a52abb0 fix is ported. - generator: writePage awaits indexFile before removeStaleGeneratedPages (eliminates deletion-vs-indexing race + pending-index loss on exit). Core: - config: openaiApiKey security warning now fires via raw pre-scan (was dead code — .strict() parsing rejected the key before the warning guard). Verified: npm run lint clean; npm test -- --run 824 passed / 1 skipped (the single failure under contention is the known flaky memory-benchmark p95 latency gate, which passes in isolation).

- search/utils.textMatchesQueryTerm: module-level caches for compiled variant regexes (cap 500) and normalized text (cap 1000) — was re-normalizing ~1200-char text + compiling fresh RegExp per variant per call (the dominant bug-query latency cost). - search/architecture-strategy: hoist selectedLayers Set out of the sort comparator (.flatMap().includes -> .has()); was O(N^2 log N). - search/bug-strategy: precompute getBugCandidateSignals + batch getChunkTagsByIds/getChunkFeaturesByIds once via per-window caches (was recomputed 5-7x per candidate + N+1 SQL per pass). - search/capability-evidence: use getChunksLightweight() (no content blob) to enumerate distinct file paths instead of getAllChunks() — eliminates loading 50k content blobs twice per resolveCapabilityEvidence call. - indexer/pipeline: per-window Map cache for resolveCallTarget so repeated identical edges resolve once (findCallers has no bulk variant — left as-is). - visualize/data-extractor: pre-build Map<chunkId,node> + Map<name|filePath>, use graph.nodes.get() and a Set for wiki-edge filtering (was repeated Array.from(values).find() linearizations — O(M*N) -> O(N)). - analysis/community-detection: computeCohesion/splitCommunity now use a single membership-bucketed edge pass (O(E) not O(C*E)); identical cohesion values verified on 128/285/428-community graphs. Verified: npm run lint clean; npm test -- --run 824 passed / 1 skipped.

Adds 48 tests across 12 new files (824 -> 872 passing), including regression tests that lock in the Phase 0/1/3 fixes: - ranker NaN-recency + k validation - memory/search status-filter leak (archived excluded by default) - context-assembler deep-route single file-list - embedder retry policy (400 not retried; 429 honors Retry-After; TypeError retried) - fts-store BM25 weighting (name > kind) - metadata-store cascade delete (no orphan targets/community rows) - semantic-features predicate boundary (island not counted; isAuthenticated is) - call-graph 5000-deep AST (no stack overflow) - pipeline-core ReadWriteLock wiring (concurrent retrieve+updateStores) - merkle ctime detection (content change with preserved mtime) Plus new coverage for context-prioritization and evidence-compressor. Also lands two Phase 3 fixes that were lost to a parallel file-write race (re-verified via the new tests): - analysis/call-graph: walkForCallNodes converted to an explicit stack (was still recursive — the original edit didn't persist). - search/evidence-compressor: minChunkTokens/targetRatio made optional and targetRatio now scales the maxLines cap (was still required + unwired). Verified: npm run lint clean; npm test -- --run 872 passed / 1 skipped.

@decorator

Packaging: - Remove stale pnpm-lock.yaml (npm is canonical); gitignore it. - Untrack .mcp.json (hardcoded user-local absolute paths); gitignore + add portable .mcp.json.example template. Parser: - languages: HTML extractableTypes restricted to script/style (was every 'element' -> hundreds of junk <div> chunks per page; 17 -> 1 on the test fixture). Add .hh/.hxx/.hcc to C++ and content-sniff .h headers for C++ markers (namespace/template/class/std::) so C++ headers parse with the C++ grammar instead of C (was silent data loss). - chunker: extractName recurses into decorated_definition's 'definition' child so Python @decorator defs are named (was <anonymous>). - tree-sitter: initTreeSitter resets the cached promise on rejection so a transient init failure doesn't poison all later getLanguage calls. Analysis + business: - imports: extractImports now dispatches per-language (Python firm; Rust/ Go/Java best-effort) instead of TS/JS-only; handle 'export * from' and 'export * as ns from' namespace re-exports (were dropped). - product-areas: widen synthetic confidence clamp [0.55,0.94] -> [0.2,0.95] so weak evidence reports low confidence; fix dead countDomainSignals multi-word check; remove camelCase false-positive in technical-label detection; export deriveBusinessDisplaySummary as canonical. Tests (+49): architecture-strategy.test.ts (27 — was 0, the biggest gap) and stores-coverage.test.ts (22 — semantic/target/community/stats stores, all previously untested). Verified: lint clean; 921 passed / 1 skipped.

Storage + visualize: - target-store/chunk-store/semantic-store: batch IN queries at the 900-param SQLite limit (was unbounded — threw on large monorepos) and cache prepared statements (was re-preparing per call / inside batch loops). - community-store: add bulk getMembershipsForChunks(); metadata-store facade delegates to it; data-extractor now resolves all chunk memberships in one batched call (was N per-chunk roundtrips on the visualize hot path). - data-extractor: import canonical deriveBusinessDisplaySummary (was a divergent local copy); read sourceCommit from frontmatter instead of the hardcoded empty-string lie; drop now-unused local helpers. Search: - ranker: clamp multiplicative matchBoost at 8 (was unbounded ~1.7^n); short query variants (<=4 chars) now require a boundary/segment match instead of raw substring (cat no longer matches category, work no longer matches network). - context-assembler: realistic header-budget estimates (was flat 40/60 -> could exceed tokenBudget); maxChunks:0 now yields zero chunks (was always >=1); formatChunk fence uses one more backtick than the content's longest run so embedded triple-backticks can't break the code fence. - capability-evidence: replace unreachable dead fallbacks with invariant throws; memoize findFileChunks/scoreCapabilityFile per invocation (was O(n^2)-ish re-scoring across seeds). Daemon/hooks/cli: - memory/runtime: detectBranch now async (execFile) instead of execFileSync on every prompt (was blocking the event loop). - pre-tool-use: narrow the dead 'deny' contract to 'allow' (every branch returned allow; deny was never enforced). - cli/lens: reject port 0 (< 0 -> < 1); diagnostics to stderr. - cli/serve: warn on invalid --max-chunks (was silently ignored). - cli/stats: friendly 'No index found' on open failure + try/finally close. - mcp-server: drop redundant dynamic fs/path re-imports; wiki_check_staleness uses async readFile (was sync in async handler). Core: - rwlock: underflow detection in releaseRead/releaseWrite (a double-release previously made readers negative and stalled the lock forever with no signal). Verified: lint clean; 921 passed / 1 skipped.

The latency/memory benchmarks under test/benchmark/* measure p95 latency and heap usage, which are unreliable when vitest runs them concurrently with 80+ other test files — sustained contention inflated p95 (~56-71ms vs the 40ms gate) and caused ~50% of full-suite runs to fail spuriously. Exclude test/benchmark/** from the default 'npm test' run via vitest config. The dedicated 'npm run benchmark:memory' (and stress) scripts still run them in isolation where their thresholds are meaningful. Default suite is now reliably green (877 unit/integration tests).

nillo added 10 commits June 19, 2026 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit remediation: stabilization, concurrency, correctness, perf, tests#4

Audit remediation: stabilization, concurrency, correctness, perf, tests#4
nillo wants to merge 10 commits into
mainfrom
fix/audit-remediation

nillo commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nillo commented Jun 19, 2026

Summary

Phases (one commit each)

Test plan

Notes / intentionally deferred

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant