diff --git a/README.md b/README.md index a0ee67e..8075b60 100644 --- a/README.md +++ b/README.md @@ -43,31 +43,48 @@ Install the toolkit on Ubuntu (for RHEL/CentOS and full details, see the [Quick 3. Configure Runtime and run a GPU container: - **Option A — AMD container runtime:** + **Option A — CDI (runtime-agnostic, no runtime configure needed):** ```bash - sudo amd-ctk runtime configure - sudo systemctl restart docker + sudo amd-ctk cdi generate --output=/etc/cdi/amd.json + sudo amd-ctk cdi validate --path=/etc/cdi/amd.json ``` Verify by running a container with all AMD GPUs: ```bash - docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=all rocm/rocm-terminal rocm-smi + docker run --rm --device amd.com/gpu=all rocm/rocm-terminal rocm-smi ``` - **Option B — CDI (runtime-agnostic, no runtime configure needed):** + > **Note:** CDI is supported by many container runtimes including Docker, Podman, and containerd. + + **Option B — AMD container runtime (Docker only):** ```bash - sudo amd-ctk cdi generate --output=/etc/cdi/amd.json - sudo amd-ctk cdi validate --path=/etc/cdi/amd.json + sudo amd-ctk runtime configure + sudo systemctl restart docker ``` Verify by running a container with all AMD GPUs: ```bash - docker run --rm --device amd.com/gpu=all rocm/rocm-terminal rocm-smi + docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=all rocm/rocm-terminal rocm-smi ``` - > **Note:** CDI is supported by many container runtimes including Docker, Podman, and containerd. - ## Usage -Select specific GPUs by index, range, or UUID: +### Using CDI + +List available CDI device entries: + +```bash +amd-ctk cdi list +``` + +Select GPUs by index using `--device amd.com/gpu=`: + +```bash +docker run --rm --device amd.com/gpu=0 rocm/rocm-terminal rocm-smi +docker run --rm --device amd.com/gpu=all rocm/rocm-terminal rocm-smi +``` + +### Using amd-container-runtime (Docker only) + +Select GPUs by index, range, or UUID with `AMD_VISIBLE_DEVICES`: ```bash docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0,1,2 rocm/rocm-terminal rocm-smi @@ -75,7 +92,7 @@ docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0-3,5,8 rocm/rocm-terminal docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0xEF2C1799A1F3E2ED rocm/rocm-terminal rocm-smi ``` -List available GPUs and their UUIDs for use with `AMD_VISIBLE_DEVICES`: +List available GPUs and their UUIDs: ```bash amd-ctk gpu list diff --git a/docs/container-runtime/cdi-guide.rst b/docs/container-runtime/cdi-guide.rst index cecdee9..897c0a6 100644 --- a/docs/container-runtime/cdi-guide.rst +++ b/docs/container-runtime/cdi-guide.rst @@ -1,6 +1,6 @@ -========================================== -Support for Container Device Interface -========================================== +========================== +Container Device Interface +========================== Overview ======== diff --git a/docs/container-runtime/framework-integration.rst b/docs/container-runtime/framework-integration.rst index 5879135..203c3fc 100644 --- a/docs/container-runtime/framework-integration.rst +++ b/docs/container-runtime/framework-integration.rst @@ -9,36 +9,44 @@ The AMD Container Toolkit is framework-agnostic but works seamlessly with popula - OpenMPI + ROCm - Custom AI/ML workflows -Typical Example: ----------------- +The examples below use :doc:`CDI ` device notation (``--device amd.com/gpu=``). Ensure a CDI specification has been generated before running these commands. -- Enabling easy container-based development and deployment across AMD GPU systems. +TensorFlow +---------- -1. TensorFlow --------------- +Run ROCm-enabled TensorFlow with a single GPU: + +.. code-block:: bash + + docker run --rm --device amd.com/gpu=0 tensorflow/tensorflow:rocm-latest -Run ROCm-enabled TensorFlow: +Or with all available GPUs: .. code-block:: bash - sudo docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0 tensorflow/tensorflow:rocm-latest + docker run --rm --device amd.com/gpu=all tensorflow/tensorflow:rocm-latest -2. PyTorch ------------ +PyTorch +------- Use ROCm-enabled PyTorch containers: .. code-block:: bash - sudo docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=all rocm/pytorch:latest + docker run --rm --device amd.com/gpu=all rocm/pytorch:latest -3.Triton Inference Server -------------------------- +Triton Inference Server +----------------------- + +Serving models with Triton using AMD GPUs is supported by adapting container images for ROCm: + +.. code-block:: bash -Serving models with Triton using AMD GPUs is supported by adapting container images for ROCm. + docker run --rm --device amd.com/gpu=all Best Practices -------------- - Always use container images tested against the matching ROCm version. -- Use environment variables or CDI device selection carefully in multi-GPU setups. +- Prefer CDI device notation (``--device amd.com/gpu=``) for portability across container runtimes. +- Use ``amd-ctk cdi list`` to discover available device entries for multi-GPU setups. diff --git a/docs/container-runtime/quick-start-guide.rst b/docs/container-runtime/quick-start-guide.rst index e27776c..58df389 100644 --- a/docs/container-runtime/quick-start-guide.rst +++ b/docs/container-runtime/quick-start-guide.rst @@ -16,7 +16,7 @@ Before installing the AMD Container Toolkit, ensure the following dependencies a - Docker version 25.0 or newer is required for all features. .. note:: - Docker Desktop on Linux is not supported for GPU workloads; see [troubleshooting](docs/container-runtime/troubleshooting.rst) to know more. + Docker Desktop on Linux is not supported for GPU workloads; see :doc:`Troubleshooting ` to know more. .. note:: - The Container Device Interface (CDI) format, used by modern container runtimes to abstract and expose GPUs, is not supported in older Docker versions. @@ -145,10 +145,17 @@ RHEL 9.5: dnf clean all dnf install -y amd-container-toolkit -You have successfully installed the AMD Container Toolkit. The next steps cover configuring Docker and running GPU workloads. If you prefer to use CDI for GPU injection, see the :doc:`CDI guide `. +You have successfully installed the AMD Container Toolkit. -Step 5: Configure Docker Runtime for AMD GPUs ---------------------------------------------- +Configuring Docker Runtime +========================== + +.. seealso:: + + For a runtime-agnostic approach to GPU injection that works with all CDI-compatible runtimes, see the :doc:`Container Device Interface ` guide. + +Step 1: Configure Docker Runtime for AMD GPUs +---------------------------------------------- - Register the AMD container runtime and restart the Docker daemon: @@ -159,8 +166,8 @@ Step 5: Configure Docker Runtime for AMD GPUs This configuration ensures that Docker is aware of the AMD container runtime and is able to support GPU-accelerated workloads using AMD Instinct devices. -Step 6: Verify Container Runtime Installation ---------------------------------------------- +Step 2: Verify Container Runtime Installation +---------------------------------------------- To run Docker containers with access to AMD GPUs, you need to specify the AMD runtime and visible GPUs. Here are some examples you can use to verify the installation: diff --git a/docs/index.md b/docs/index.md index 30cd037..3a1fec7 100644 --- a/docs/index.md +++ b/docs/index.md @@ -12,6 +12,7 @@ This documentation site provides information about the AMD Container Toolkit, wh - [Overview](container-runtime/overview.rst) - [Requirements](container-runtime/requirements.rst) - [Quick Start Guide](container-runtime/quick-start-guide.rst) +- [Container Device Interface](container-runtime/cdi-guide.rst) - [Running Workloads](container-runtime/running-workloads.rst) - [Framework Integration](container-runtime/framework-integration.rst) - [Troubleshooting](container-runtime/troubleshooting.rst) @@ -20,6 +21,5 @@ This documentation site provides information about the AMD Container Toolkit, wh - [Docker Compose](container-runtime/docker-compose.rst) - [Enroot Pyxis Installation](container-runtime/enroot-pyxis-installation.md) - [Support for Docker Swarm](container-runtime/docker-swarm.md) -- [Support for Container Device Interface](container-runtime/cdi-guide.rst) - [GPU Tracker](container-runtime/gpu-tracker.md) - [Release Notes](container-runtime/release-notes.rst) diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 0a5d2d7..c826d21 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -8,6 +8,8 @@ subtrees: title: Requirements - file: container-runtime/quick-start-guide.rst title: Quick Start Guide + - file: container-runtime/cdi-guide.rst + title: Container Device Interface - file: container-runtime/running-workloads.rst title: Running Workloads - file: container-runtime/framework-integration.rst @@ -24,8 +26,6 @@ subtrees: title: Enroot Pyxis Installation - file: container-runtime/docker-swarm.md title: Support for Docker Swarm - - file: container-runtime/cdi-guide.rst - title: Support for Container Device Interface - file: container-runtime/gpu-tracker title: GPU Tracker - file: container-runtime/release-notes.rst