Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 29 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,39 +43,56 @@ Install the toolkit on Ubuntu (for RHEL/CentOS and full details, see the [Quick

3. Configure Runtime and run a GPU container:

**Option A — AMD container runtime:**
**Option A — CDI (runtime-agnostic, no runtime configure needed):**
```bash
sudo amd-ctk runtime configure
sudo systemctl restart docker
sudo amd-ctk cdi generate --output=/etc/cdi/amd.json
sudo amd-ctk cdi validate --path=/etc/cdi/amd.json
```
Verify by running a container with all AMD GPUs:
```bash
docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=all rocm/rocm-terminal rocm-smi
docker run --rm --device amd.com/gpu=all rocm/rocm-terminal rocm-smi
```

**Option B — CDI (runtime-agnostic, no runtime configure needed):**
> **Note:** CDI is supported by many container runtimes including Docker, Podman, and containerd.

**Option B — AMD container runtime (Docker only):**
```bash
sudo amd-ctk cdi generate --output=/etc/cdi/amd.json
sudo amd-ctk cdi validate --path=/etc/cdi/amd.json
sudo amd-ctk runtime configure
sudo systemctl restart docker
```
Verify by running a container with all AMD GPUs:
```bash
docker run --rm --device amd.com/gpu=all rocm/rocm-terminal rocm-smi
docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=all rocm/rocm-terminal rocm-smi
```

> **Note:** CDI is supported by many container runtimes including Docker, Podman, and containerd.

## Usage

Select specific GPUs by index, range, or UUID:
### Using CDI

List available CDI device entries:

```bash
amd-ctk cdi list
```

Select GPUs by index using `--device amd.com/gpu=<entry>`:

```bash
docker run --rm --device amd.com/gpu=0 rocm/rocm-terminal rocm-smi
docker run --rm --device amd.com/gpu=all rocm/rocm-terminal rocm-smi
```

### Using amd-container-runtime (Docker only)

Select GPUs by index, range, or UUID with `AMD_VISIBLE_DEVICES`:

```bash
docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0,1,2 rocm/rocm-terminal rocm-smi
docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0-3,5,8 rocm/rocm-terminal rocm-smi
docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0xEF2C1799A1F3E2ED rocm/rocm-terminal rocm-smi
```

List available GPUs and their UUIDs for use with `AMD_VISIBLE_DEVICES`:
List available GPUs and their UUIDs:

```bash
amd-ctk gpu list
Expand Down
6 changes: 3 additions & 3 deletions docs/container-runtime/cdi-guide.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
==========================================
Support for Container Device Interface
==========================================
==========================
Container Device Interface
==========================

Overview
========
Expand Down
36 changes: 22 additions & 14 deletions docs/container-runtime/framework-integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,36 +9,44 @@ The AMD Container Toolkit is framework-agnostic but works seamlessly with popula
- OpenMPI + ROCm
- Custom AI/ML workflows

Typical Example:
----------------
The examples below use :doc:`CDI <cdi-guide>` device notation (``--device amd.com/gpu=<entry>``). Ensure a CDI specification has been generated before running these commands.

- Enabling easy container-based development and deployment across AMD GPU systems.
TensorFlow
----------

1. TensorFlow
--------------
Run ROCm-enabled TensorFlow with a single GPU:

.. code-block:: bash

docker run --rm --device amd.com/gpu=0 tensorflow/tensorflow:rocm-latest

Run ROCm-enabled TensorFlow:
Or with all available GPUs:

.. code-block:: bash

sudo docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0 tensorflow/tensorflow:rocm-latest
docker run --rm --device amd.com/gpu=all tensorflow/tensorflow:rocm-latest

2. PyTorch
-----------
PyTorch
-------

Use ROCm-enabled PyTorch containers:

.. code-block:: bash

sudo docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=all rocm/pytorch:latest
docker run --rm --device amd.com/gpu=all rocm/pytorch:latest

3.Triton Inference Server
-------------------------
Triton Inference Server
-----------------------

Serving models with Triton using AMD GPUs is supported by adapting container images for ROCm:

.. code-block:: bash

Serving models with Triton using AMD GPUs is supported by adapting container images for ROCm.
docker run --rm --device amd.com/gpu=all <triton-rocm-image>

Best Practices
--------------

- Always use container images tested against the matching ROCm version.
- Use environment variables or CDI device selection carefully in multi-GPU setups.
- Prefer CDI device notation (``--device amd.com/gpu=<entry>``) for portability across container runtimes.
- Use ``amd-ctk cdi list`` to discover available device entries for multi-GPU setups.
19 changes: 13 additions & 6 deletions docs/container-runtime/quick-start-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Before installing the AMD Container Toolkit, ensure the following dependencies a
- Docker version 25.0 or newer is required for all features.

.. note::
Docker Desktop on Linux is not supported for GPU workloads; see [troubleshooting](docs/container-runtime/troubleshooting.rst) to know more.
Docker Desktop on Linux is not supported for GPU workloads; see :doc:`Troubleshooting <troubleshooting>` to know more.

.. note::
- The Container Device Interface (CDI) format, used by modern container runtimes to abstract and expose GPUs, is not supported in older Docker versions.
Expand Down Expand Up @@ -145,10 +145,17 @@ RHEL 9.5:
dnf clean all
dnf install -y amd-container-toolkit

You have successfully installed the AMD Container Toolkit. The next steps cover configuring Docker and running GPU workloads. If you prefer to use CDI for GPU injection, see the :doc:`CDI guide <cdi-guide>`.
You have successfully installed the AMD Container Toolkit.

Step 5: Configure Docker Runtime for AMD GPUs
---------------------------------------------
Configuring Docker Runtime
==========================

.. seealso::

For a runtime-agnostic approach to GPU injection that works with all CDI-compatible runtimes, see the :doc:`Container Device Interface <cdi-guide>` guide.

Step 1: Configure Docker Runtime for AMD GPUs
----------------------------------------------

- Register the AMD container runtime and restart the Docker daemon:

Expand All @@ -159,8 +166,8 @@ Step 5: Configure Docker Runtime for AMD GPUs

This configuration ensures that Docker is aware of the AMD container runtime and is able to support GPU-accelerated workloads using AMD Instinct devices.

Step 6: Verify Container Runtime Installation
---------------------------------------------
Step 2: Verify Container Runtime Installation
----------------------------------------------

To run Docker containers with access to AMD GPUs, you need to specify the AMD runtime and visible GPUs. Here are some examples you can use to verify the installation:

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ This documentation site provides information about the AMD Container Toolkit, wh
- [Overview](container-runtime/overview.rst)
- [Requirements](container-runtime/requirements.rst)
- [Quick Start Guide](container-runtime/quick-start-guide.rst)
- [Container Device Interface](container-runtime/cdi-guide.rst)
- [Running Workloads](container-runtime/running-workloads.rst)
- [Framework Integration](container-runtime/framework-integration.rst)
- [Troubleshooting](container-runtime/troubleshooting.rst)
Expand All @@ -20,6 +21,5 @@ This documentation site provides information about the AMD Container Toolkit, wh
- [Docker Compose](container-runtime/docker-compose.rst)
- [Enroot Pyxis Installation](container-runtime/enroot-pyxis-installation.md)
- [Support for Docker Swarm](container-runtime/docker-swarm.md)
- [Support for Container Device Interface](container-runtime/cdi-guide.rst)
- [GPU Tracker](container-runtime/gpu-tracker.md)
- [Release Notes](container-runtime/release-notes.rst)
4 changes: 2 additions & 2 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ subtrees:
title: Requirements
- file: container-runtime/quick-start-guide.rst
title: Quick Start Guide
- file: container-runtime/cdi-guide.rst
title: Container Device Interface
- file: container-runtime/running-workloads.rst
title: Running Workloads
- file: container-runtime/framework-integration.rst
Expand All @@ -24,8 +26,6 @@ subtrees:
title: Enroot Pyxis Installation
- file: container-runtime/docker-swarm.md
title: Support for Docker Swarm
- file: container-runtime/cdi-guide.rst
title: Support for Container Device Interface
- file: container-runtime/gpu-tracker
title: GPU Tracker
- file: container-runtime/release-notes.rst
Expand Down
Loading