中文主页: README.zh-CN.md
ROCm/gfx1150 quick start: README_ROCM_gfx1150.md
Documentation map: docs/README.md
Validation index: validation/README.md
Benchmark index: docs/benchmarks/README.md
cuPDLP-C-ROCm is a ROCm/HIP port and validation fork of upstream cuPDLP-C for AMD GPUs/APUs. The project keeps the CPU path and upstream-compatible CUDA path, and adds a ROCm/HIP backend for AMD Radeon-class hardware.
| Item | Current value |
|---|---|
| Primary ROCm target | AMD Radeon 890M |
| ROCm architecture | gfx1150 |
| ROCm version used in local validation | 7.2.1 |
| Planned larger AMD target | AMD Radeon PRO W7900 / gfx1100 |
| CUDA baseline devices | RTX 3090, RTX 4090D, H100 |
Status: experimental but buildable. The ROCm/HIP backend has passed smoke validation, Netlib validation, cross-device benchmark checks, and large-MPS baseline testing on AMD Radeon 890M /
gfx1150. It is not yet a production-ready or fully tuned ROCm solver release.
| Need | English | 中文 |
|---|---|---|
| ROCm/gfx1150 quick start | README_ROCM_gfx1150.md | README_ROCM_gfx1150.zh-CN.md |
| Full documentation map | docs/README.md | docs/README.md |
| Validation data index | validation/README.md | validation/README.zh-CN.md |
| Benchmark index | docs/benchmarks/README.md | docs/benchmarks/README.md |
README_UPSTREAM.md is intentionally kept as an upstream reference snapshot and is not translated or rewritten as project documentation.
The raw .mps benchmark files are not committed. Curated result CSVs and explanation documents are committed instead.
| Topic | English | 中文 | Raw CSV |
|---|---|---|---|
| Large MPS CUDA/ROCm baseline | summary | 中文版 | platform summary, per-case timing |
| cuPDLPx vs cuPDLP-C short13 | comparison | 中文版 | comparison CSV |
- CPU-only cuPDLP-C build path.
- Upstream-compatible CUDA build path for NVIDIA baselines.
- ROCm/HIP backend built from migrated CUDA backend code.
plcexecutable linked against the ROCm/HIP backend.- CPU-vs-ROCm smoke validation scripts.
- Extended Netlib validation cases.
- Cross-device benchmark workflows and summaries for RTX 3090, RTX 4090D, H100, and Radeon 890M.
- Large MPS benchmark documents and curated CSV summaries.
rocprofv3profiling workflow and ROCm tuning notes.- Migration documentation for CUDA-to-ROCm/HIP scientific-computing projects.
| Mode | CMake options | Role |
|---|---|---|
| CPU | BUILD_CUDA=OFF, BUILD_ROCM=OFF |
Correctness and portability baseline |
| CUDA | BUILD_CUDA=ON, BUILD_ROCM=OFF |
Upstream-compatible NVIDIA backend and benchmark baseline |
| ROCm/HIP | BUILD_CUDA=OFF, BUILD_ROCM=ON |
AMD Radeon ROCm/HIP target backend |
BUILD_CUDA and BUILD_ROCM must not be enabled at the same time. Use separate build directories such as build-cpu, build-cuda, and build-rocm-plc.
Large MPS baseline status:
| Platform | Backend | Result |
|---|---|---|
| RTX 3090 | CUDA upstream | 25/26 OPTIMAL, 1/26 TIMELIMIT |
| Radeon 890M | ROCm/HIP baseline | 24/26 OPTIMAL, 2/26 TIMELIMIT |
| RTX 4090D | CUDA upstream | 26/26 OPTIMAL |
| H100 | CUDA upstream | 26/26 OPTIMAL |
cuPDLPx short13 comparison status:
| Solver | Platform | Result |
|---|---|---|
| cuPDLP-C upstream | RTX 4090D CUDA | 13/13 OPTIMAL on selected short/medium cases |
| cuPDLPx v0.2.9 | RTX 4090D CUDA | 13/13 OPTIMAL on the same selected cases |
cmake -S . -B build-rocm-plc -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_CUDA=OFF \
-DBUILD_ROCM=ON \
-DBUILD_APPS=OFF \
-DBUILD_PYTHON=OFF \
-DBUILD_TESTING=ON \
-DCMAKE_PREFIX_PATH=/opt/rocm \
-DCMAKE_HIP_ARCHITECTURES=gfx1150
cmake --build build-rocm-plc --target plc -j"$(nproc)"Run a smoke example:
./build-rocm-plc/bin/plc \
-fname ./example/afiro.mps \
-out /tmp/afiro_rocm_sum.json \
-nIterLim 200./scripts/check_rocm_port.sh
ctest --test-dir build-rocm-plc --output-on-failureExtended validation:
RESULT_ROOT=validation/results/extended_netlib \
./scripts/run_validation.sh validation/cases_extended_netlib.txtSee validation/README.md for curated validation summaries and CSV files.
RESULT_ROOT=profiling/results/current ./scripts/profile_rocm_smoke.sh
python3 scripts/summarize_rocm_profile.py \
--input profiling/results/current \
--output profiling/results/current/profile_summary.mdSee docs/ROCM_PROFILING_NOTES.md, docs/ROCM_TUNING_HISTORY.md, and docs/TUNING_GUIDE_ROCM.md.
Identify the GPU architecture:
rocminfo | grep -E "Name:|Marketing Name|gfx"
rocm_agent_enumeratorThen set the matching architecture, for example:
-DCMAKE_HIP_ARCHITECTURES=gfx1100for AMD Radeon PRO W7900, depending on ROCm support.