This update focused on turning the code-analyzer project into a stronger Gemini CLI extension workflow with practical caching and measurable benchmark outputs.
Completed areas:
- Multi-layer index caching (memory + disk)
- On-demand file read caching (raw snapshots + symbols-only)
- MCP cache control and visibility tools
- Updated benchmark runs for both local mock mode and MCP handoff mode
- Documentation refresh in README
- Added
src/CacheManager.ts - Introduced configurable cache controls:
maxEntriesttlMs
- Added operations:
get,set,delete,clear,prune,stats
- Eviction strategy:
- Time-based expiration (TTL)
- Capacity enforcement by least-recently-accessed timestamp
- Updated
src/LazyFileReader.ts - Added two in-memory caches:
rawFileCachefor file content snapshots and line arrayssymbolsCachefor exported-signature text blocks
- Added cache control functions:
clearReadFileCache()getReadFileCacheStats()
- Cache safety behavior:
- Snapshot cache is invalidated naturally when
sizeormtimeMschanges baseDirboundary checks prevent path traversal
- Snapshot cache is invalidated naturally when
- Updated
src/repoIndex.ts - Added
buildIndexWithCache(targetDir, options)with cache-aware behavior - Added cache metadata in response:
enabledhitlayer(memory,disk,rebuild)cacheFilefingerprint
- Added disk cache payload with versioning:
versiontargetDirfingerprintgeneratedAtindex
- Added cache invalidation and diagnostics:
invalidateIndexCache(...)getIndexCacheStats(...)
- Updated
src/mcpLazyServer.ts - Existing tools retained:
repo_indexread_file
- New tools added:
index_cache_invalidateindex_cache_stats
repo_indexnow supportsforceRefreshfor explicit rebuild behavior
- Updated
tests/LazyFileReader.test.ts - Updated
tests/repoIndex.test.ts - Updated
tests/mcpLazyServer.test.ts - Coverage now includes cache stats and invalidation tool paths in MCP flows
- Updated
benchmark-results.json(mock mode) - Updated
benchmark-results-mcp.json(MCP handoff mode) - Latest large-scope target used:
Rocket.Chat/apps/meteor/serverfilesIndexed: 148
Recent entries:
-
Mock mode (
benchmark-results.json)naiveTokens: 307582skeletonTokens: 11595totalSessionTokens: 12002filesReadOnDemand: 2indexCacheHit: false
-
MCP handoff mode (
benchmark-results-mcp.json)naiveTokens: 307582skeletonTokens: 11595totalSessionTokens: 11595filesReadOnDemand: 0indexCacheHit: true
The project is a TypeScript-first code analysis layer designed to reduce LLM context cost by replacing full-repo ingestion with a staged retrieval model.
Staged flow:
- Build semantic skeleton from exports (
repo_index) - Let the model reason on compact structure first
- Lazily retrieve only required implementation snippets (
read_file)
This architecture separates discovery from deep inspection and enforces bounded payloads.
Primary concerns:
- Source file collection with exclusions (
node_modules,dist,.git, tests,.d.ts) - AST parsing through
ts-morph - Extraction of exported signatures across functions/classes/interfaces/types/enums
- Prompt-oriented formatting via
formatIndexForPrompt - Cost baseline estimator via
countNaiveTokens - Multi-layer cache orchestration with fingerprint validation
Primary concerns:
- Safe file retrieval inside constrained base directory
- Selective read modes:
- full
- capped
- line-range
- symbols-only
- Token estimate support for per-read budgeting
- Fast repeat reads through in-memory caches
Primary concerns:
- JSON-RPC over stdio framing
- MCP tool registration and argument validation
- Tool routing to index and read services
- Cache-control observability and invalidation entry points
Primary concerns:
- Command mode routing (
mock,mcp,live) - Benchmark emission into persistent JSON logs
- Comparison between naive and staged token budgets
- Optional live tool loop simulation for Gemini interactions
Primary concerns:
- Generic in-memory cache utility with TTL and bounded size
- Reusable component for both index and read workflows
- Lightweight introspection via cache stats for operations visibility
The project now uses cache layering to reduce repeated CPU and I/O work.
Index path:
- Check in-memory index snapshot
- Fallback to disk cache payload
- Rebuild index if fingerprint mismatch or cache miss
- Persist rebuilt snapshot to both memory and disk
Read path:
- Check raw snapshot cache by file path
- Validate using file stat metadata (
size,mtimeMs) - Recompute only when file has changed
- Optionally cache symbols-only representation for repeated structural reads
Operational impact:
- Reduces repeated parsing and formatting costs
- Improves second-call responsiveness in both local and MCP execution
- Preserves correctness through conservative invalidation checks
The benchmark logs quantify context reduction versus naive source loading.
At 148 indexed files:
- Naive baseline: 307,582 estimated tokens
- Skeleton context: 11,595 estimated tokens
- Effective reduction factor: approximately 26.5x before extra file reads
Mock mode then adds a small incremental payload from selected file reads. MCP handoff mode keeps session payload near skeleton-only until additional reads are requested.
The project follows Gemini extension conventions by combining:
gemini-extension.jsonmanifestGEMINI.mdextension guidance file- MCP stdio server implementation
- Tool descriptors under
gemini-extension/tools
This allows direct extension linking and tool-call based analysis workflows.
Current risks:
- Skeleton token size can still grow significantly on very large repositories because inferred type strings can become long.
- Fingerprint computation scales with file count and requires stat traversal each run.
- Prompt formatter currently favors readability over maximal compression.
Recommended next optimizations:
- Compact type normalization (remove long import path segments in printed types).
- Output budgets per file (cap number of exported signatures serialized).
- Optional minimal index mode for high-level triage (
name,kind,pathonly). - File watcher driven invalidation for long-lived MCP sessions.
Last verified state in this cycle:
- TypeScript build: passing
- Test suites: passing
- MCP tool list: includes 4 tools
- Benchmarks: appended in both benchmark files for large target scope