Skip to content

Yale-PROCTOR/llm_translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Translation

A Rust CLI tool that translates C2Rust output or raw C source code into idiomatic Rust using LLM-powered analysis, with automated build verification and test harness integration.

What It Does

  • Translates mechanically-generated C2Rust code or raw C source into idiomatic, safe Rust
  • Iterative translation with build and test feedback loops (retries on failure)
  • Supports both library programs (tested via cando2 dlopen harness) and executable programs (tested via stdin/stdout comparison)
  • Clippy-based idiomaticity scoring
  • Organized output in timestamped run directories

Prerequisites

  • Rust toolchain (1.70+)
  • An API key for one of the supported LLM providers (OpenAI or Anthropic)
  • Test programs with test_vectors/ (see Test Program Layout)

Installation

git clone <repo-url>
cd llm_translation
cargo build --release

Usage

# Using OpenAI (default provider)
export OPENAI_API_KEY=sk-...
cargo run --release -- Public-Tests/

# Using Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
cargo run --release -- Public-Tests/ --provider anthropic --model claude-sonnet-4-20250514

# Translate a single program
cargo run --release -- Public-Tests/B01_organic/bin2hex_lib

# Multiple paths at once
cargo run --release -- Public-Tests/B01_organic Public-Tests/B02_organic

# With options
cargo run --release -- Public-Tests/ \
  --model gpt-4 \
  --max-retries 3 \
  --max-lines 500 \
  --report summary.md

# Build-only mode (skip running test vectors)
cargo run --release -- Public-Tests/ --skip-tests

# Translate from C source instead of C2Rust output
cargo run --release -- Public-Tests/ --from-c

# Also check dst/ (c2rust output) when translated_rust/ is not found
cargo run --release -- Public-Tests/ --from-c2rust

# Resume a previous run, skipping already-successful programs
cargo run --release -- Public-Tests/ --resume runs/gpt-4_20260305_120000

# JSON output format
cargo run --release -- Public-Tests/ --format json

CLI Options

Flag Default Description
--provider <NAME> openai LLM provider: openai or anthropic
--max-retries 5 Max LLM retry attempts per program
--max-lines 2000 Skip source files exceeding this line count
--skip-tests false Only verify build succeeds, skip test vectors
--from-c false Force translation from C source (test_case/)
--from-c2rust false Also check dst/ (c2rust output) when translated_rust/ is not found
--resume <RUN_DIR> none Resume a previous run, skipping already-successful programs
--report <PATH> none Write an extra copy of the report to this path
--api-key <KEY> $OPENAI_API_KEY / $ANTHROPIC_API_KEY API key (env var depends on provider)
--model <MODEL> gpt-5.2 LLM model name
--temperature <FLOAT> 0.2 LLM sampling temperature (0.0 = deterministic, 1.0 = creative)
--format <FMT> markdown Output format: markdown or json

Test Program Layout

Each program directory must contain:

program_name/
  test_vectors/          # Required: JSON test inputs/outputs
    1.json
    2.json
  runner/                # Library programs only: cando2 test harness
  translated_rust/       # C2Rust or CRAT output (preferred source)
    src/lib.rs
    Cargo.toml
  dst/<name>/            # Alternative: raw c2rust output
  test_case/             # Alternative: raw C source
    src/lib.c            # or src/main.c for executables
    include/lib.h

Source resolution order (unless --from-c is set):

  1. translated_rust/ (CRAT output)
  2. dst/<name>/ (raw C2Rust output, only checked if --from-c2rust is set)
  3. test_case/ (raw C source, automatic fallback)

Program Types

  • Library (has runner/): Compiled as cdylib, tested via the cando2 dlopen harness. Outputs lib.rs.
  • Executable (no runner/): Compiled as a binary, tested by running with each test vector's argv/stdin and comparing stdout/stderr/exit code. Outputs main.rs.

Test Vector Format

{
  "argv": ["arg1", "arg2"],
  "stdin": "input text",
  "stdout": { "pattern": "expected output", "is_regex": false },
  "stderr": { "pattern": "", "is_regex": false },
  "rc": 0
}

Output Structure

Results are saved to runs/<model>_<YYYYMMDD>_<HHMMSS>/:

runs/
  gpt-4_20260305_120000/
    bin2hex_lib/
      translated_rust_llm/
        lib.rs              # Translated code
        Cargo.toml
        results.json        # Per-program result
    report.md               # Summary report
    usage.csv               # Token usage per program

How It Works

  1. Discover: Walks directories to find programs with test_vectors/ and source code
  2. Collect: Gathers source from the resolved source directory
  3. Translate: Sends source to LLM with a prompt tailored to source type and program type (4 variants: lib C2Rust, lib C, exe C2Rust, exe C)
  4. Build: Compiles with cargo build --release
  5. Test: Runs test vectors (cando2 for libraries, stdin/stdout for executables). Individual tests time out after 30s; library test harnesses time out after 120s
  6. Retry: On failure, formats build errors or test diffs as feedback and retries
  7. Score: Runs clippy analysis and computes an idiomaticity score (see Idiomaticity Scoring)
  8. Report: Generates per-run report with results for every program

Idiomaticity Scoring

After a successful translation, the pipeline runs cargo clippy on the output and performs static analysis to produce a score from 0 (C-like) to 100 (idiomatic Rust). The score starts at 100 and deductions are applied based on three metrics:

Metric How it's measured Penalty
Unsafe blocks Regex count of unsafe { in the source First 5 are free (expected for FFI); each additional block deducts 2 points
Raw pointers Regex count of *mut/*const type declarations and as *mut/as *const casts First 10 are free (expected for FFI); each additional usage deducts 1 point
Clippy warnings Number of warnings from cargo clippy -W clippy::all Each warning deducts 3 points

The final score is clamped to the 0–100 range. A score of 100 means the translated code has ≤5 unsafe blocks, ≤10 raw pointer usages, and zero clippy warnings. The thresholds are intentionally lenient for the first few occurrences because FFI-boundary code (#[no_mangle] pub unsafe extern "C" fn) inherently requires some unsafe and raw pointers.

The per-program metrics (unsafe_blocks, raw_pointers, clippy_warnings) are included in both the markdown and JSON reports alongside the composite score.

Library Usage

llm_translation can also be used as a Rust library dependency:

[dependencies]
llm_translation = { path = "../llm_translation" }
use llm_translation::{TranslationAgent, TranslationConfig};

let config = TranslationConfig {
    provider: "anthropic".to_string(),
    api_key: "sk-ant-...".to_string(),
    model: "claude-sonnet-4-20250514".to_string(),
    max_retries: 3,
    ..Default::default()
};

let agent = TranslationAgent::new(config);
let report = agent.translate_all(&[path]).await?;

Public API

  • TranslationAgent - Main orchestrator
  • TranslationConfig - Pipeline configuration
  • TranslationReport, TranslationResult, ProgramStatus - Result types
  • ProgramInfo, ProgramType, SourceType - Program metadata
  • LlmClient, LlmRequest, LlmResponse, create_client - LLM abstraction
  • OpenAIClient, AnthropicClient - Provider implementations

Development

# Run all tests
cargo test

# Build
cargo build

# Build for release
cargo build --release

Architecture

src/
  lib.rs                     # Public API
  main.rs                    # Standalone CLI binary
  cli.rs                     # CLI argument parsing
  llm/
    mod.rs                   # LLM client factory
    types.rs                 # LlmClient trait, LlmRequest, LlmResponse
    openai.rs                # OpenAI implementation
    anthropic.rs             # Anthropic implementation
  translation/
    mod.rs                   # Orchestrator (discover, translate, test loop)
    translator.rs            # LLM prompt construction
    test_runner.rs           # Build + test harness (cando2 and executable)
    report.rs                # Report types and markdown generation
    clippy.rs                # Idiomaticity scoring
    feedback.rs              # Error formatting for LLM retry
tools/
  cando2/                    # Test harness for library programs (dlopen)

License

MIT

About

A simple LLM-based whole program C to Rust translator

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages