LLM Translation

A Rust CLI tool that translates C2Rust output or raw C source code into idiomatic Rust using LLM-powered analysis, with automated build verification and test harness integration.

What It Does

Translates mechanically-generated C2Rust code or raw C source into idiomatic, safe Rust
Iterative translation with build and test feedback loops (retries on failure)
Supports both library programs (tested via cando2 dlopen harness) and executable programs (tested via stdin/stdout comparison)
Clippy-based idiomaticity scoring
Organized output in timestamped run directories

Prerequisites

Rust toolchain (1.70+)
An API key for one of the supported LLM providers (OpenAI or Anthropic)
Test programs with test_vectors/ (see Test Program Layout)

Installation

git clone <repo-url>
cd llm_translation
cargo build --release

Usage

# Using OpenAI (default provider)
export OPENAI_API_KEY=sk-...
cargo run --release -- Public-Tests/

# Using Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
cargo run --release -- Public-Tests/ --provider anthropic --model claude-sonnet-4-20250514

# Translate a single program
cargo run --release -- Public-Tests/B01_organic/bin2hex_lib

# Multiple paths at once
cargo run --release -- Public-Tests/B01_organic Public-Tests/B02_organic

# With options
cargo run --release -- Public-Tests/ \
  --model gpt-4 \
  --max-retries 3 \
  --max-lines 500 \
  --report summary.md

# Build-only mode (skip running test vectors)
cargo run --release -- Public-Tests/ --skip-tests

# Translate from C source instead of C2Rust output
cargo run --release -- Public-Tests/ --from-c

# Also check dst/ (c2rust output) when translated_rust/ is not found
cargo run --release -- Public-Tests/ --from-c2rust

# Resume a previous run, skipping already-successful programs
cargo run --release -- Public-Tests/ --resume runs/gpt-4_20260305_120000

# JSON output format
cargo run --release -- Public-Tests/ --format json

CLI Options

Flag	Default	Description
`--provider <NAME>`	openai	LLM provider: `openai` or `anthropic`
`--max-retries`	5	Max LLM retry attempts per program
`--max-lines`	2000	Skip source files exceeding this line count
`--skip-tests`	false	Only verify build succeeds, skip test vectors
`--from-c`	false	Force translation from C source (`test_case/`)
`--from-c2rust`	false	Also check `dst/` (c2rust output) when `translated_rust/` is not found
`--resume <RUN_DIR>`	none	Resume a previous run, skipping already-successful programs
`--report <PATH>`	none	Write an extra copy of the report to this path
`--api-key <KEY>`	`$OPENAI_API_KEY` / `$ANTHROPIC_API_KEY`	API key (env var depends on provider)
`--model <MODEL>`	gpt-5.2	LLM model name
`--temperature <FLOAT>`	0.2	LLM sampling temperature (0.0 = deterministic, 1.0 = creative)
`--format <FMT>`	markdown	Output format: `markdown` or `json`

Test Program Layout

Each program directory must contain:

program_name/
  test_vectors/          # Required: JSON test inputs/outputs
    1.json
    2.json
  runner/                # Library programs only: cando2 test harness
  translated_rust/       # C2Rust or CRAT output (preferred source)
    src/lib.rs
    Cargo.toml
  dst/<name>/            # Alternative: raw c2rust output
  test_case/             # Alternative: raw C source
    src/lib.c            # or src/main.c for executables
    include/lib.h

Source resolution order (unless --from-c is set):

translated_rust/ (CRAT output)
dst/<name>/ (raw C2Rust output, only checked if --from-c2rust is set)
test_case/ (raw C source, automatic fallback)

Program Types

Library (has runner/): Compiled as cdylib, tested via the cando2 dlopen harness. Outputs lib.rs.
Executable (no runner/): Compiled as a binary, tested by running with each test vector's argv/stdin and comparing stdout/stderr/exit code. Outputs main.rs.

Test Vector Format

{
  "argv": ["arg1", "arg2"],
  "stdin": "input text",
  "stdout": { "pattern": "expected output", "is_regex": false },
  "stderr": { "pattern": "", "is_regex": false },
  "rc": 0
}

Output Structure

Results are saved to runs/<model>_<YYYYMMDD>_<HHMMSS>/:

runs/
  gpt-4_20260305_120000/
    bin2hex_lib/
      translated_rust_llm/
        lib.rs              # Translated code
        Cargo.toml
        results.json        # Per-program result
    report.md               # Summary report
    usage.csv               # Token usage per program

How It Works

Discover: Walks directories to find programs with test_vectors/ and source code
Collect: Gathers source from the resolved source directory
Translate: Sends source to LLM with a prompt tailored to source type and program type (4 variants: lib C2Rust, lib C, exe C2Rust, exe C)
Build: Compiles with cargo build --release
Test: Runs test vectors (cando2 for libraries, stdin/stdout for executables). Individual tests time out after 30s; library test harnesses time out after 120s
Retry: On failure, formats build errors or test diffs as feedback and retries
Score: Runs clippy analysis and computes an idiomaticity score (see Idiomaticity Scoring)
Report: Generates per-run report with results for every program

Idiomaticity Scoring

After a successful translation, the pipeline runs cargo clippy on the output and performs static analysis to produce a score from 0 (C-like) to 100 (idiomatic Rust). The score starts at 100 and deductions are applied based on three metrics:

Metric	How it's measured	Penalty
Unsafe blocks	Regex count of `unsafe {` in the source	First 5 are free (expected for FFI); each additional block deducts 2 points
Raw pointers	Regex count of `mut`/`const` type declarations and `as mut`/`as const` casts	First 10 are free (expected for FFI); each additional usage deducts 1 point
Clippy warnings	Number of warnings from `cargo clippy -W clippy::all`	Each warning deducts 3 points

The final score is clamped to the 0–100 range. A score of 100 means the translated code has ≤5 unsafe blocks, ≤10 raw pointer usages, and zero clippy warnings. The thresholds are intentionally lenient for the first few occurrences because FFI-boundary code (#[no_mangle] pub unsafe extern "C" fn) inherently requires some unsafe and raw pointers.

The per-program metrics (unsafe_blocks, raw_pointers, clippy_warnings) are included in both the markdown and JSON reports alongside the composite score.

Library Usage

llm_translation can also be used as a Rust library dependency:

[dependencies]
llm_translation = { path = "../llm_translation" }

use llm_translation::{TranslationAgent, TranslationConfig};

let config = TranslationConfig {
    provider: "anthropic".to_string(),
    api_key: "sk-ant-...".to_string(),
    model: "claude-sonnet-4-20250514".to_string(),
    max_retries: 3,
    ..Default::default()
};

let agent = TranslationAgent::new(config);
let report = agent.translate_all(&[path]).await?;

Public API

TranslationAgent - Main orchestrator
TranslationConfig - Pipeline configuration
TranslationReport, TranslationResult, ProgramStatus - Result types
ProgramInfo, ProgramType, SourceType - Program metadata
LlmClient, LlmRequest, LlmResponse, create_client - LLM abstraction
OpenAIClient, AnthropicClient - Provider implementations

Development

# Run all tests
cargo test

# Build
cargo build

# Build for release
cargo build --release

Architecture

src/
  lib.rs                     # Public API
  main.rs                    # Standalone CLI binary
  cli.rs                     # CLI argument parsing
  llm/
    mod.rs                   # LLM client factory
    types.rs                 # LlmClient trait, LlmRequest, LlmResponse
    openai.rs                # OpenAI implementation
    anthropic.rs             # Anthropic implementation
  translation/
    mod.rs                   # Orchestrator (discover, translate, test loop)
    translator.rs            # LLM prompt construction
    test_runner.rs           # Build + test harness (cando2 and executable)
    report.rs                # Report types and markdown generation
    clippy.rs                # Idiomaticity scoring
    feedback.rs              # Error formatting for LLM retry
tools/
  cando2/                    # Test harness for library programs (dlopen)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
scripts		scripts
src		src
tools/cando2		tools/cando2
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
PUBLIC_TESTS.md		PUBLIC_TESTS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Translation

What It Does

Prerequisites

Installation

Usage

CLI Options

Test Program Layout

Program Types

Test Vector Format

Output Structure

How It Works

Idiomaticity Scoring

Library Usage

Public API

Development

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Translation

What It Does

Prerequisites

Installation

Usage

CLI Options

Test Program Layout

Program Types

Test Vector Format

Output Structure

How It Works

Idiomaticity Scoring

Library Usage

Public API

Development

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages