Skip to content

Latest commit

 

History

History
240 lines (172 loc) · 7.11 KB

File metadata and controls

240 lines (172 loc) · 7.11 KB

Changes Made and Full Technical Summary

Scope of Recent Work

This update focused on turning the code-analyzer project into a stronger Gemini CLI extension workflow with practical caching and measurable benchmark outputs.

Completed areas:

  1. Multi-layer index caching (memory + disk)
  2. On-demand file read caching (raw snapshots + symbols-only)
  3. MCP cache control and visibility tools
  4. Updated benchmark runs for both local mock mode and MCP handoff mode
  5. Documentation refresh in README

Detailed Change Log

1) New Cache Primitive

  • Added src/CacheManager.ts
  • Introduced configurable cache controls:
    • maxEntries
    • ttlMs
  • Added operations:
    • get, set, delete, clear, prune, stats
  • Eviction strategy:
    • Time-based expiration (TTL)
    • Capacity enforcement by least-recently-accessed timestamp

2) Lazy File Reader Caching

  • Updated src/LazyFileReader.ts
  • Added two in-memory caches:
    • rawFileCache for file content snapshots and line arrays
    • symbolsCache for exported-signature text blocks
  • Added cache control functions:
    • clearReadFileCache()
    • getReadFileCacheStats()
  • Cache safety behavior:
    • Snapshot cache is invalidated naturally when size or mtimeMs changes
    • baseDir boundary checks prevent path traversal

3) Repository Index Caching

  • Updated src/repoIndex.ts
  • Added buildIndexWithCache(targetDir, options) with cache-aware behavior
  • Added cache metadata in response:
    • enabled
    • hit
    • layer (memory, disk, rebuild)
    • cacheFile
    • fingerprint
  • Added disk cache payload with versioning:
    • version
    • targetDir
    • fingerprint
    • generatedAt
    • index
  • Added cache invalidation and diagnostics:
    • invalidateIndexCache(...)
    • getIndexCacheStats(...)

4) MCP Tooling Expansion

  • Updated src/mcpLazyServer.ts
  • Existing tools retained:
    • repo_index
    • read_file
  • New tools added:
    • index_cache_invalidate
    • index_cache_stats
  • repo_index now supports forceRefresh for explicit rebuild behavior

5) Tests Updated

  • Updated tests/LazyFileReader.test.ts
  • Updated tests/repoIndex.test.ts
  • Updated tests/mcpLazyServer.test.ts
  • Coverage now includes cache stats and invalidation tool paths in MCP flows

6) Benchmark Artifacts Refreshed

  • Updated benchmark-results.json (mock mode)
  • Updated benchmark-results-mcp.json (MCP handoff mode)
  • Latest large-scope target used:
    • Rocket.Chat/apps/meteor/server
    • filesIndexed: 148

Recent entries:

  1. Mock mode (benchmark-results.json)

    • naiveTokens: 307582
    • skeletonTokens: 11595
    • totalSessionTokens: 12002
    • filesReadOnDemand: 2
    • indexCacheHit: false
  2. MCP handoff mode (benchmark-results-mcp.json)

    • naiveTokens: 307582
    • skeletonTokens: 11595
    • totalSessionTokens: 11595
    • filesReadOnDemand: 0
    • indexCacheHit: true

Full Complex Codebase Summary

1. System Purpose and Architecture

The project is a TypeScript-first code analysis layer designed to reduce LLM context cost by replacing full-repo ingestion with a staged retrieval model.

Staged flow:

  1. Build semantic skeleton from exports (repo_index)
  2. Let the model reason on compact structure first
  3. Lazily retrieve only required implementation snippets (read_file)

This architecture separates discovery from deep inspection and enforces bounded payloads.

2. Core Modules and Responsibilities

src/repoIndex.ts

Primary concerns:

  • Source file collection with exclusions (node_modules, dist, .git, tests, .d.ts)
  • AST parsing through ts-morph
  • Extraction of exported signatures across functions/classes/interfaces/types/enums
  • Prompt-oriented formatting via formatIndexForPrompt
  • Cost baseline estimator via countNaiveTokens
  • Multi-layer cache orchestration with fingerprint validation

src/LazyFileReader.ts

Primary concerns:

  • Safe file retrieval inside constrained base directory
  • Selective read modes:
    • full
    • capped
    • line-range
    • symbols-only
  • Token estimate support for per-read budgeting
  • Fast repeat reads through in-memory caches

src/mcpLazyServer.ts

Primary concerns:

  • JSON-RPC over stdio framing
  • MCP tool registration and argument validation
  • Tool routing to index and read services
  • Cache-control observability and invalidation entry points

src/demo.ts

Primary concerns:

  • Command mode routing (mock, mcp, live)
  • Benchmark emission into persistent JSON logs
  • Comparison between naive and staged token budgets
  • Optional live tool loop simulation for Gemini interactions

src/CacheManager.ts

Primary concerns:

  • Generic in-memory cache utility with TTL and bounded size
  • Reusable component for both index and read workflows
  • Lightweight introspection via cache stats for operations visibility

3. Caching Design Deep Dive

The project now uses cache layering to reduce repeated CPU and I/O work.

Index path:

  1. Check in-memory index snapshot
  2. Fallback to disk cache payload
  3. Rebuild index if fingerprint mismatch or cache miss
  4. Persist rebuilt snapshot to both memory and disk

Read path:

  1. Check raw snapshot cache by file path
  2. Validate using file stat metadata (size, mtimeMs)
  3. Recompute only when file has changed
  4. Optionally cache symbols-only representation for repeated structural reads

Operational impact:

  • Reduces repeated parsing and formatting costs
  • Improves second-call responsiveness in both local and MCP execution
  • Preserves correctness through conservative invalidation checks

4. Benchmark Interpretation

The benchmark logs quantify context reduction versus naive source loading.

At 148 indexed files:

  • Naive baseline: 307,582 estimated tokens
  • Skeleton context: 11,595 estimated tokens
  • Effective reduction factor: approximately 26.5x before extra file reads

Mock mode then adds a small incremental payload from selected file reads. MCP handoff mode keeps session payload near skeleton-only until additional reads are requested.

5. Extension Alignment Status

The project follows Gemini extension conventions by combining:

  • gemini-extension.json manifest
  • GEMINI.md extension guidance file
  • MCP stdio server implementation
  • Tool descriptors under gemini-extension/tools

This allows direct extension linking and tool-call based analysis workflows.

6. Risks and Optimization Opportunities

Current risks:

  1. Skeleton token size can still grow significantly on very large repositories because inferred type strings can become long.
  2. Fingerprint computation scales with file count and requires stat traversal each run.
  3. Prompt formatter currently favors readability over maximal compression.

Recommended next optimizations:

  1. Compact type normalization (remove long import path segments in printed types).
  2. Output budgets per file (cap number of exported signatures serialized).
  3. Optional minimal index mode for high-level triage (name, kind, path only).
  4. File watcher driven invalidation for long-lived MCP sessions.

7. Verification Snapshot

Last verified state in this cycle:

  • TypeScript build: passing
  • Test suites: passing
  • MCP tool list: includes 4 tools
  • Benchmarks: appended in both benchmark files for large target scope