Summary
Current benchmarking only covers Python repositories. codectx supports 9 languages and its core mechanism (dependency graph centrality, tier ranking, budget-driven compression) applies to any codebase. The benchmarking suite should reflect this by covering diverse repository types across languages, sizes, and structural patterns.
Motivation
- Token reduction numbers on the landing page and in documentation must be derived from real, reproducible benchmarks across representative repos
- Python-only benchmarks underrepresent the tool's actual scope and potential impact
- Different repository types stress different parts of the pipeline: a monorepo stresses the walker and graph construction, a deeply nested library stresses the ranker, a polyglot repo stresses the parser
- Benchmark results feed directly into the landing page via the benchmarking repo JSON, so coverage quality directly affects what is shown to users
Proposed Repository Categories
By language
- Python: fastapi, requests, rich, httpx (already covered)
- Go: gin, cobra, or similar mid-size Go CLI/web project
- Rust: ripgrep, tokio, or a mid-size Rust library
- JavaScript/TypeScript: express, zod, or a mid-size TS project
- Java: a mid-size Java library or CLI tool
- Ruby: a mid-size Ruby gem or Rails plugin
- C/C++: a mid-size systems library
By repository type
- CLI tool (already covered via fastapi, httpx)
- Web framework or HTTP library
- Systems/low-level library
- Monorepo with multiple packages
- Data processing library
- Test-heavy repository (high ratio of test files to source)
- Auto-generated code heavy (protobuf, OpenAPI generated clients)
By size
- Small: under 50 files
- Medium: 50 to 500 files
- Large: 500 to 5000 files
- Extra large: 5000+ files (stress test for walker and graph performance)
Output Format
Each benchmark entry in results.json should include:
{
"repository": "owner/repo",
"language": "python",
"category": "web-framework",
"size": "large",
"naive_tokens": 224000,
"codectx_tokens": 78000,
"reduction_percent": 64.9,
"token_budget": 120000,
"codectx_version": "0.3.0",
"run_at": "2026-01-01T00:00:00Z"
}
Acceptance Criteria
References
Summary
Current benchmarking only covers Python repositories. codectx supports 9 languages and its core mechanism (dependency graph centrality, tier ranking, budget-driven compression) applies to any codebase. The benchmarking suite should reflect this by covering diverse repository types across languages, sizes, and structural patterns.
Motivation
Proposed Repository Categories
By language
By repository type
By size
Output Format
Each benchmark entry in results.json should include:
{ "repository": "owner/repo", "language": "python", "category": "web-framework", "size": "large", "naive_tokens": 224000, "codectx_tokens": 78000, "reduction_percent": 64.9, "token_budget": 120000, "codectx_version": "0.3.0", "run_at": "2026-01-01T00:00:00Z" }Acceptance Criteria
References