StepForge

An independent, from-scratch reimplementation of STEP-LLM (Chen et al., 2026), developed as an undergraduate research project at Purdue University with Prof. Gomez.

This is not a fork of the official STEP-LLM code. Every component — data pipeline, retrieval system, training infrastructure, reward functions, evaluation, and inference — was written independently.

What this project is

The goal: fine-tune open-source LLMs to generate raw STEP files (ISO 10303) from natural language descriptions of 3D parts, using supervised fine-tuning followed by GRPO reinforcement learning with a geometric reward signal.

The approach follows the STEP-LLM paper. The dataset comes from Text2CAD. Everything else — the code, the infrastructure, the engineering — is original work.

My contributions

The following was built from scratch for this project:

Multi-GPU SFT training (training/sft_multigpu.py) — HuggingFace Trainer
- PEFT/LoRA with DDP across 4× H100 80GB on Purdue's Gautschi HPC cluster, including checkpoint resumption, gradient checkpointing, memory diagnostics, and automatic SLURM resubmission chains
GRPO RL training (training/rl_train.py) — TRL GRPOTrainer with live RAG retrieval, three reward functions (format, parse, geometry), and distributed training across 4× H100
Data pipeline — STEP file parsing, DFS reserializer with chain-of-thought annotations, caption pairing, dataset filtering and splitting
RAG system — FAISS caption index, SentenceTransformer embeddings, pre-computed retrieval for SFT and live retrieval for RL
Reward functions — STEP → 3D point cloud conversion, FPFH+RANSAC+ICP geometric alignment, Scaled Chamfer Distance reward implementation
Refined variant — an experimental data variant with alternative field structure, run in parallel with the main pipeline for comparison
HPC infrastructure — Gautschi cluster setup, SLURM job scripts, memory fragmentation debugging, OOM diagnosis and resolution across multiple failure modes (eval batch size, expandable segments, torch.load compatibility)
Evaluation and inference — CR, RR, MSCD, AEC metric implementation, Gradio demo, single-file inference script

Attribution to prior work

STEP-LLM (Chen et al., 2026) introduced the approach this project reimplements: using LLMs to generate STEP files via SFT + GRPO with a Scaled Chamfer Distance reward. The paper's hyperparameters, model selection (Llama-3.2-3B-Instruct), and training methodology are followed as closely as possible. The paper's reported results are the benchmark target below.

Text2CAD (Khan et al., NeurIPS 2024) provides the dataset: ~170K CAD models paired with natural language captions. The dataset and one export script (data/export_steps.py, adapted from their repository under Apache 2.0) are used here. See ATTRIBUTION.md for a file-by-file breakdown.

Status

Phase	Status
Data pipeline (export, parse, reserialize, pair, split)	Done
RAG index build + pre-computation	Done
SFT training — main variant (10 epochs, Llama-3.2-3B, 4× H100)	Done
SFT training — refined variant (10 epochs, 4× H100)	Done
RL training — GRPO, 80 steps	In progress
Evaluation	Pending

Results (target)

The STEP-LLM paper reports the following against the Text2CAD baseline:

Method	CR (%)	RR (%)	MSCD	AEC
Text2CAD	—	98.38	3.99	390.41
STEP-LLM (SFT)	97.00	95.18	0.53	240.99
STEP-LLM (GRPO)	99.00	92.00	0.098	—

These are the paper's reported numbers. Independent reproduction is in progress.

Setup

Gautschi HPC (4× H100 80GB — primary)

bash gautschi_setup.sh
conda activate stepforge
export HUGGINGFACE_TOKEN=your_token_here
sbatch slurm_sft_4gpu_gautschi.sh
sbatch slurm_rl_gautschi.sh

Local / other

conda env create -f environment.yml
conda activate stepforge
export HUGGINGFACE_TOKEN=your_token_here

Running

# Step 1: Build dataset
python data/build_dataset.py --config configs/config_gautschi.yaml

# Step 2: Build FAISS retrieval index
python retrieval/build_index.py --config configs/config_gautschi.yaml

# Step 3: Pre-compute RAG for SFT
python data/precompute_rag.py --config configs/config_gautschi.yaml

# Step 4: SFT (10 epochs)
sbatch slurm_sft_4gpu_gautschi.sh

# Step 5: RL with GRPO (80 steps)
sbatch slurm_rl_gautschi.sh

# Step 6: Evaluate
python evaluation/evaluate.py --checkpoint checkpoints/rl/final \
    --config configs/config_gautschi.yaml

Quick inference

python inference/generate.py \
    --caption "a hollow cylinder" \
    --output /tmp/cylinder.step \
    --checkpoint checkpoints/rl/final

Project structure

StepForge/
├── configs/
│   ├── config_gautschi.yaml         # Main variant — Gautschi H100 cluster
│   └── config_gautschi_refined.yaml # Refined variant — alternative data format
├── data/
│   ├── export_steps.py              # [adapted from Text2CAD] .pth → STEP files
│   ├── step_parser.py               # Parse STEP entity DAG
│   ├── dfs_reserializer.py          # DFS traversal + CoT annotations
│   ├── pair_captions.py             # Pair STEP with abstract captions
│   ├── filter_dataset.py            # Filter + split
│   ├── precompute_rag.py            # Pre-retrieve top-1 STEP per training example
│   └── build_dataset.py             # Orchestrator
├── retrieval/
│   ├── build_index.py               # Build FAISS caption index
│   └── retriever.py                 # Live RAG retrieval
├── training/
│   ├── sft_multigpu.py              # Multi-GPU SFT (DDP, 4× H100)
│   ├── rl_train.py                  # GRPO reinforcement learning
│   └── preflight_check.py           # Environment validation
├── reward/
│   ├── step_to_pointcloud.py        # STEP → 3D point cloud
│   ├── alignment.py                 # FPFH+RANSAC+ICP alignment
│   └── scd_reward.py                # Scaled Chamfer Distance reward
├── inference/generate.py            # Generate STEP from caption
├── evaluation/evaluate.py           # CR, RR, MSCD, AEC metrics
├── app.py                           # Gradio demo
├── ATTRIBUTION.md                   # File-by-file code origin breakdown
└── LICENSE                          # Apache 2.0

References

STEP-LLM: Chen et al., 2026. STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models. arXiv:2601.12641. [Paper] · [Official code]
Text2CAD: Khan et al., NeurIPS 2024 Spotlight. Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts. [Paper] · [Code] · [Project page]
Unsloth: The original single-GPU SFT script (training/llama3_SFT_response.py) was structured around the Unsloth fine-tuning template (Apache 2.0). [Repository]

Name		Name	Last commit message	Last commit date
Latest commit History 278 Commits
app		app
configs		configs
data		data
evaluation		evaluation
inference		inference
plots		plots
retrieval		retrieval
reward		reward
scripts		scripts
tests		tests
training		training
.env.example		.env.example
.gitignore		.gitignore
ATTRIBUTION.md		ATTRIBUTION.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
REPORT.md		REPORT.md
app.py		app.py
environment.yml		environment.yml
gautschi_setup.sh		gautschi_setup.sh
requirements.txt		requirements.txt
runpod_setup.sh		runpod_setup.sh
scholar_setup.sh		scholar_setup.sh
setup_pod.sh		setup_pod.sh
slurm_data_gautschi.sh		slurm_data_gautschi.sh
slurm_eval_gautschi.sh		slurm_eval_gautschi.sh
slurm_eval_indist_gautschi.sh		slurm_eval_indist_gautschi.sh
slurm_rl.sh		slurm_rl.sh
slurm_rl_gautschi.sh		slurm_rl_gautschi.sh
slurm_rl_refined_gautschi.sh		slurm_rl_refined_gautschi.sh
slurm_sft.sh		slurm_sft.sh
slurm_sft_4gpu_gautschi.sh		slurm_sft_4gpu_gautschi.sh
slurm_sft_4gpu_refined_gautschi.sh		slurm_sft_4gpu_refined_gautschi.sh
slurm_sft_gautschi.sh		slurm_sft_gautschi.sh
slurm_sft_multigpu_gautschi.sh		slurm_sft_multigpu_gautschi.sh
slurm_sft_multigpu_refined_gautschi.sh		slurm_sft_multigpu_refined_gautschi.sh
slurm_smoke_test.sh		slurm_smoke_test.sh
test_export.py		test_export.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StepForge

What this project is

My contributions

Attribution to prior work

Status

Results (target)

Setup

Gautschi HPC (4× H100 80GB — primary)

Local / other

Running

Quick inference

Project structure

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StepForge

What this project is

My contributions

Attribution to prior work

Status

Results (target)

Setup

Gautschi HPC (4× H100 80GB — primary)

Local / other

Running

Quick inference

Project structure

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages