feat: expand benchmarking suite to cover multi-language and diverse repository types

## Summary

Current benchmarking only covers Python repositories. codectx supports 9 languages and its core mechanism (dependency graph centrality, tier ranking, budget-driven compression) applies to any codebase. The benchmarking suite should reflect this by covering diverse repository types across languages, sizes, and structural patterns.

## Motivation

- Token reduction numbers on the landing page and in documentation must be derived from real, reproducible benchmarks across representative repos
- Python-only benchmarks underrepresent the tool's actual scope and potential impact
- Different repository types stress different parts of the pipeline: a monorepo stresses the walker and graph construction, a deeply nested library stresses the ranker, a polyglot repo stresses the parser
- Benchmark results feed directly into the landing page via the benchmarking repo JSON, so coverage quality directly affects what is shown to users

## Proposed Repository Categories

### By language
- Python: fastapi, requests, rich, httpx (already covered)
- Go: gin, cobra, or similar mid-size Go CLI/web project
- Rust: ripgrep, tokio, or a mid-size Rust library
- JavaScript/TypeScript: express, zod, or a mid-size TS project
- Java: a mid-size Java library or CLI tool
- Ruby: a mid-size Ruby gem or Rails plugin
- C/C++: a mid-size systems library

### By repository type
- CLI tool (already covered via fastapi, httpx)
- Web framework or HTTP library
- Systems/low-level library
- Monorepo with multiple packages
- Data processing library
- Test-heavy repository (high ratio of test files to source)
- Auto-generated code heavy (protobuf, OpenAPI generated clients)

### By size
- Small: under 50 files
- Medium: 50 to 500 files
- Large: 500 to 5000 files
- Extra large: 5000+ files (stress test for walker and graph performance)

## Output Format

Each benchmark entry in results.json should include:

```json
{
  "repository": "owner/repo",
  "language": "python",
  "category": "web-framework",
  "size": "large",
  "naive_tokens": 224000,
  "codectx_tokens": 78000,
  "reduction_percent": 64.9,
  "token_budget": 120000,
  "codectx_version": "0.3.0",
  "run_at": "2026-01-01T00:00:00Z"
}
```

## Acceptance Criteria

- [ ] At least 3 non-Python repositories benchmarked and added to results.json
- [ ] At least one repository from each size category (small, medium, large)
- [ ] At least one monorepo included
- [ ] At least one test-heavy repository included
- [ ] All benchmark entries follow the updated results.json schema above
- [ ] CI workflow updated to run benchmarks for all entries on each release
- [ ] README in benchmarking repo updated to document how to add a new benchmark target

## References

- Benchmarking repository: https://github.com/hey-granth/codectx-benchmarking
- Current results: https://github.com/hey-granth/codectx-benchmarking/blob/main/benchmark_results
- Landing page fetches benchmark data from the above JSON at build time


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expand benchmarking suite to cover multi-language and diverse repository types #7

Summary

Motivation

Proposed Repository Categories

By language

By repository type

By size

Output Format

Acceptance Criteria

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: expand benchmarking suite to cover multi-language and diverse repository types #7

Description

Summary

Motivation

Proposed Repository Categories

By language

By repository type

By size

Output Format

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions