A Rust CLI tool that translates C2Rust output or raw C source code into idiomatic Rust using LLM-powered analysis, with automated build verification and test harness integration.
- Translates mechanically-generated C2Rust code or raw C source into idiomatic, safe Rust
- Iterative translation with build and test feedback loops (retries on failure)
- Supports both library programs (tested via
cando2dlopen harness) and executable programs (tested via stdin/stdout comparison) - Clippy-based idiomaticity scoring
- Organized output in timestamped run directories
- Rust toolchain (1.70+)
- An API key for one of the supported LLM providers (OpenAI or Anthropic)
- Test programs with
test_vectors/(see Test Program Layout)
git clone <repo-url>
cd llm_translation
cargo build --release# Using OpenAI (default provider)
export OPENAI_API_KEY=sk-...
cargo run --release -- Public-Tests/
# Using Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
cargo run --release -- Public-Tests/ --provider anthropic --model claude-sonnet-4-20250514
# Translate a single program
cargo run --release -- Public-Tests/B01_organic/bin2hex_lib
# Multiple paths at once
cargo run --release -- Public-Tests/B01_organic Public-Tests/B02_organic
# With options
cargo run --release -- Public-Tests/ \
--model gpt-4 \
--max-retries 3 \
--max-lines 500 \
--report summary.md
# Build-only mode (skip running test vectors)
cargo run --release -- Public-Tests/ --skip-tests
# Translate from C source instead of C2Rust output
cargo run --release -- Public-Tests/ --from-c
# Also check dst/ (c2rust output) when translated_rust/ is not found
cargo run --release -- Public-Tests/ --from-c2rust
# Resume a previous run, skipping already-successful programs
cargo run --release -- Public-Tests/ --resume runs/gpt-4_20260305_120000
# JSON output format
cargo run --release -- Public-Tests/ --format json| Flag | Default | Description |
|---|---|---|
--provider <NAME> |
openai | LLM provider: openai or anthropic |
--max-retries |
5 | Max LLM retry attempts per program |
--max-lines |
2000 | Skip source files exceeding this line count |
--skip-tests |
false | Only verify build succeeds, skip test vectors |
--from-c |
false | Force translation from C source (test_case/) |
--from-c2rust |
false | Also check dst/ (c2rust output) when translated_rust/ is not found |
--resume <RUN_DIR> |
none | Resume a previous run, skipping already-successful programs |
--report <PATH> |
none | Write an extra copy of the report to this path |
--api-key <KEY> |
$OPENAI_API_KEY / $ANTHROPIC_API_KEY |
API key (env var depends on provider) |
--model <MODEL> |
gpt-5.2 | LLM model name |
--temperature <FLOAT> |
0.2 | LLM sampling temperature (0.0 = deterministic, 1.0 = creative) |
--format <FMT> |
markdown | Output format: markdown or json |
Each program directory must contain:
program_name/
test_vectors/ # Required: JSON test inputs/outputs
1.json
2.json
runner/ # Library programs only: cando2 test harness
translated_rust/ # C2Rust or CRAT output (preferred source)
src/lib.rs
Cargo.toml
dst/<name>/ # Alternative: raw c2rust output
test_case/ # Alternative: raw C source
src/lib.c # or src/main.c for executables
include/lib.h
Source resolution order (unless --from-c is set):
translated_rust/(CRAT output)dst/<name>/(raw C2Rust output, only checked if--from-c2rustis set)test_case/(raw C source, automatic fallback)
- Library (has
runner/): Compiled ascdylib, tested via thecando2dlopen harness. Outputslib.rs. - Executable (no
runner/): Compiled as a binary, tested by running with each test vector's argv/stdin and comparing stdout/stderr/exit code. Outputsmain.rs.
{
"argv": ["arg1", "arg2"],
"stdin": "input text",
"stdout": { "pattern": "expected output", "is_regex": false },
"stderr": { "pattern": "", "is_regex": false },
"rc": 0
}Results are saved to runs/<model>_<YYYYMMDD>_<HHMMSS>/:
runs/
gpt-4_20260305_120000/
bin2hex_lib/
translated_rust_llm/
lib.rs # Translated code
Cargo.toml
results.json # Per-program result
report.md # Summary report
usage.csv # Token usage per program
- Discover: Walks directories to find programs with
test_vectors/and source code - Collect: Gathers source from the resolved source directory
- Translate: Sends source to LLM with a prompt tailored to source type and program type (4 variants: lib C2Rust, lib C, exe C2Rust, exe C)
- Build: Compiles with
cargo build --release - Test: Runs test vectors (cando2 for libraries, stdin/stdout for executables). Individual tests time out after 30s; library test harnesses time out after 120s
- Retry: On failure, formats build errors or test diffs as feedback and retries
- Score: Runs clippy analysis and computes an idiomaticity score (see Idiomaticity Scoring)
- Report: Generates per-run report with results for every program
After a successful translation, the pipeline runs cargo clippy on the output and performs static analysis to produce a score from 0 (C-like) to 100 (idiomatic Rust). The score starts at 100 and deductions are applied based on three metrics:
| Metric | How it's measured | Penalty |
|---|---|---|
| Unsafe blocks | Regex count of unsafe { in the source |
First 5 are free (expected for FFI); each additional block deducts 2 points |
| Raw pointers | Regex count of *mut/*const type declarations and as *mut/as *const casts |
First 10 are free (expected for FFI); each additional usage deducts 1 point |
| Clippy warnings | Number of warnings from cargo clippy -W clippy::all |
Each warning deducts 3 points |
The final score is clamped to the 0–100 range. A score of 100 means the translated code has ≤5 unsafe blocks, ≤10 raw pointer usages, and zero clippy warnings. The thresholds are intentionally lenient for the first few occurrences because FFI-boundary code (#[no_mangle] pub unsafe extern "C" fn) inherently requires some unsafe and raw pointers.
The per-program metrics (unsafe_blocks, raw_pointers, clippy_warnings) are included in both the markdown and JSON reports alongside the composite score.
llm_translation can also be used as a Rust library dependency:
[dependencies]
llm_translation = { path = "../llm_translation" }use llm_translation::{TranslationAgent, TranslationConfig};
let config = TranslationConfig {
provider: "anthropic".to_string(),
api_key: "sk-ant-...".to_string(),
model: "claude-sonnet-4-20250514".to_string(),
max_retries: 3,
..Default::default()
};
let agent = TranslationAgent::new(config);
let report = agent.translate_all(&[path]).await?;TranslationAgent- Main orchestratorTranslationConfig- Pipeline configurationTranslationReport,TranslationResult,ProgramStatus- Result typesProgramInfo,ProgramType,SourceType- Program metadataLlmClient,LlmRequest,LlmResponse,create_client- LLM abstractionOpenAIClient,AnthropicClient- Provider implementations
# Run all tests
cargo test
# Build
cargo build
# Build for release
cargo build --releasesrc/
lib.rs # Public API
main.rs # Standalone CLI binary
cli.rs # CLI argument parsing
llm/
mod.rs # LLM client factory
types.rs # LlmClient trait, LlmRequest, LlmResponse
openai.rs # OpenAI implementation
anthropic.rs # Anthropic implementation
translation/
mod.rs # Orchestrator (discover, translate, test loop)
translator.rs # LLM prompt construction
test_runner.rs # Build + test harness (cando2 and executable)
report.rs # Report types and markdown generation
clippy.rs # Idiomaticity scoring
feedback.rs # Error formatting for LLM retry
tools/
cando2/ # Test harness for library programs (dlopen)