feat(agent): update Dockerfile for NVIDIA agent (#2002) by DQ-Kwon · Pull Request #2003 · henrygd/beszel

DQ-Kwon · 2026-05-14T13:48:29Z

📃 Description

Feature #2002 Optimized the agent-nvidia image for size and multi-arch support. By switching to a Distroless base and mounting nvidia-smi from the host, the image size is reduced by 75%.

🪵 Changelog

➕ Added

Multi-arch support: amd64, arm64, arm/v7.
Dynamic library tracking for smartctl on Distroless.

✏️ Changed

Base image: ubuntu-cuda → distroless/base-debian12.
Reduced uncompressed size by ~270MB.

🗑️ Removed

Bundled CUDA layers and shell (via Distroless).

svenvg93 · 2026-05-14T18:30:39Z

This will be a major breaking changes right? As the user need the SMI tool on the host and added to the docker compose files for it to work ?

DQ-Kwon · 2026-05-14T22:48:39Z

Hi @svenvg93, thanks for the feedback. I agree this is a major change, and I see your point.

In the NVIDIA ecosystem, having nvidia-smi on the host is practically a standard requirement. Much like mounting docker.sock, I believe mounting the SMI tool is a more lightweight and practical approach for monitoring.

As for the base image, I chose Debian over Alpine because NVIDIA's official binaries are built for glibc. In my experience, Debian is far more stable than Alpine’s musl environment for these GPU tasks.

Regarding the changes to the compose file, I’m attaching the updated compose.yml below for reference:

services:
  beszel-agent:
    image: henrygd/beszel-agent-nvidia:slim
    container_name: beszel-agent
    restart: unless-stopped
    network_mode: host
    gpus: all

    volumes:
      - ./beszel_agent_data:/var/lib/beszel-agent
      - /var/run/docker.sock:/var/run/docker.sock:ro
      # If using WSL, the path might be: /usr/lib/wsl/lib/nvidia-smi
      - /usr/bin/nvidia-smi:/usr/bin/nvidia-smi:ro

    environment:
      LISTEN: 45876
      KEY: "<public key>"
      HUB_URL: "<hub url>"
      TOKEN: "<token>"

      GPU_COLLECTOR: nvidia-smi
      NVIDIA_VISIBLE_DEVICES: all
      NVIDIA_DRIVER_CAPABILITIES: compute,utility

Note: Technically, the gpus: all option should allow the NVIDIA Container Toolkit to handle library injections automatically. However, I’ve included the explicit mount for nvidia-smi to ensure availability across different environments. In my tests, it worked even when commented out, but I've kept it as a safeguard for broader compatibility.

svenvg93 · 2026-05-15T11:10:00Z

Hi @DQ-Kwon,

I love the work you did for this. I think that as long as it will be on a different tag there wont be any impact for the user. Lets see what Hengry thinks about this :)

…lim variant

…b Actions workflow

DQ-Kwon · 2026-05-15T13:53:49Z

I agree with your point. I’ve reverted the beszel-agent-nvidia image and separated it into a new tag called beszel-agent-nvidia:slim. This should ensure that existing users won't face any issues.

svenvg93 · 2026-05-17T19:58:50Z

Hi! #2016 got me triggered to see if we could not do one image to have all the monitoring in one place. Technically with your image the Intel GPU would also work if the intel_top_gpu is mounted in the container.

wondering what @henrygd thinks about this, as it would technically allow to have one image instead of 4 different ones.

DQ-Kwon · 2026-05-18T05:14:45Z

Hi @svenvg93,

Thank you for the suggestion! I've spent some time reviewing the feasibility of an All-in-one image, and it's a very interesting concept.

However, due to architectural differences between GPU vendors, implementing this may be more challenging than it initially appears.

The current implementation relies heavily on the NVIDIA Container Toolkit. When a container is started with NVIDIA resources, the toolkit automatically exposes the /dev/nvidia* devices, the nvidia-smi binary, and the required shared libraries from the host into the container. This allows us to keep the image extremely lightweight (Distroless) while maintaining good driver compatibility.

For Intel and AMD GPUs, the situation is a bit different. While hardware access can generally be provided via --device /dev/dri, vendor-specific monitoring tools such as intel_gpu_top or radeontop — along with their dependent shared libraries — are not automatically available inside the container in the same way.

To support an All-in-one image under these constraints, we would likely face a significant trade-off:

Fat Image Approach: Pre-installing vendor-specific tools for Intel, AMD, and NVIDIA inside the image. This would substantially increase the image size, which goes against the lightweight refactoring achieved in this PR.
Manual Mounting Approach: Requiring users to manually mount host binaries and shared library paths into the container. This would make the docker-compose.yml considerably more complex and negatively impact the user experience.

Supporting all vendors cleanly within a single image would also require broader testing and long-term maintenance across multiple GPU ecosystems and host distributions.

I'm not deeply familiar with every non-NVIDIA GPU/container ecosystem, so there may be gaps or outdated assumptions in my review. I would definitely appreciate input from contributors with more experience in Intel or AMD GPU environments.

For the short term, I think keeping device-specific images is the more practical approach to preserve simplicity and optimization. That said, I agree the idea is valuable, and it may be worth revisiting later as a broader long-term improvement.

What are your thoughts on this?

svenvg93 · 2026-05-18T10:32:33Z

Hi @DQ-Kwon,

Thanks for checking!
I was already afraid for it 😅 . Think that one image might be easier from an end user perspective, even if the size is big.
Having the smaller image like you are proposing also have its benefits. Neither of them of clear pros and cons. I will check if there are other options in terms for GPU monitoring to make it more lean.

Think the main question is what @henrygd thinks of it in long terms in terms of support.

feat(agent): update Dockerfile for NVIDIA agent

264dac8

DQ-Kwon requested a review from henrygd as a code owner May 14, 2026 13:48

DQ-Kwon force-pushed the feature/lightweight-nvidia-agent branch from 235bd80 to 264dac8 Compare May 14, 2026 14:57

DQ-Kwon added 2 commits May 15, 2026 22:47

refactor(agent): rollback Dockerfile for NVIDIA agent and introduce s…

1d411ae

…lim variant

feat(agent): add slim variant for NVIDIA agent Docker images in GitHu…

fbe5bc7

…b Actions workflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(agent): update Dockerfile for NVIDIA agent (#2002)#2003

feat(agent): update Dockerfile for NVIDIA agent (#2002)#2003
DQ-Kwon wants to merge 3 commits into
henrygd:mainfrom
DQ-Kwon:feature/lightweight-nvidia-agent

DQ-Kwon commented May 14, 2026

Uh oh!

svenvg93 commented May 14, 2026

Uh oh!

DQ-Kwon commented May 14, 2026 •

edited

Loading

Uh oh!

svenvg93 commented May 15, 2026

Uh oh!

DQ-Kwon commented May 15, 2026

Uh oh!

svenvg93 commented May 17, 2026

Uh oh!

DQ-Kwon commented May 18, 2026

Uh oh!

svenvg93 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

DQ-Kwon commented May 14, 2026

📃 Description

🪵 Changelog

➕ Added

✏️ Changed

🗑️ Removed

Uh oh!

svenvg93 commented May 14, 2026

Uh oh!

DQ-Kwon commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

svenvg93 commented May 15, 2026

Uh oh!

DQ-Kwon commented May 15, 2026

Uh oh!

svenvg93 commented May 17, 2026

Uh oh!

DQ-Kwon commented May 18, 2026

Uh oh!

svenvg93 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DQ-Kwon commented May 14, 2026 •

edited

Loading