Skip to content

feat: expand benchmarking suite to cover multi-language and diverse repository types #7

@hey-granth

Description

@hey-granth

Summary

Current benchmarking only covers Python repositories. codectx supports 9 languages and its core mechanism (dependency graph centrality, tier ranking, budget-driven compression) applies to any codebase. The benchmarking suite should reflect this by covering diverse repository types across languages, sizes, and structural patterns.

Motivation

  • Token reduction numbers on the landing page and in documentation must be derived from real, reproducible benchmarks across representative repos
  • Python-only benchmarks underrepresent the tool's actual scope and potential impact
  • Different repository types stress different parts of the pipeline: a monorepo stresses the walker and graph construction, a deeply nested library stresses the ranker, a polyglot repo stresses the parser
  • Benchmark results feed directly into the landing page via the benchmarking repo JSON, so coverage quality directly affects what is shown to users

Proposed Repository Categories

By language

  • Python: fastapi, requests, rich, httpx (already covered)
  • Go: gin, cobra, or similar mid-size Go CLI/web project
  • Rust: ripgrep, tokio, or a mid-size Rust library
  • JavaScript/TypeScript: express, zod, or a mid-size TS project
  • Java: a mid-size Java library or CLI tool
  • Ruby: a mid-size Ruby gem or Rails plugin
  • C/C++: a mid-size systems library

By repository type

  • CLI tool (already covered via fastapi, httpx)
  • Web framework or HTTP library
  • Systems/low-level library
  • Monorepo with multiple packages
  • Data processing library
  • Test-heavy repository (high ratio of test files to source)
  • Auto-generated code heavy (protobuf, OpenAPI generated clients)

By size

  • Small: under 50 files
  • Medium: 50 to 500 files
  • Large: 500 to 5000 files
  • Extra large: 5000+ files (stress test for walker and graph performance)

Output Format

Each benchmark entry in results.json should include:

{
  "repository": "owner/repo",
  "language": "python",
  "category": "web-framework",
  "size": "large",
  "naive_tokens": 224000,
  "codectx_tokens": 78000,
  "reduction_percent": 64.9,
  "token_budget": 120000,
  "codectx_version": "0.3.0",
  "run_at": "2026-01-01T00:00:00Z"
}

Acceptance Criteria

  • At least 3 non-Python repositories benchmarked and added to results.json
  • At least one repository from each size category (small, medium, large)
  • At least one monorepo included
  • At least one test-heavy repository included
  • All benchmark entries follow the updated results.json schema above
  • CI workflow updated to run benchmarks for all entries on each release
  • README in benchmarking repo updated to document how to add a new benchmark target

References

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions