feat(agent): update Dockerfile for NVIDIA agent (#2002)#2003
Conversation
235bd80 to
264dac8
Compare
|
This will be a major breaking changes right? As the user need the SMI tool on the host and added to the docker compose files for it to work ? |
|
Hi @svenvg93, thanks for the feedback. I agree this is a major change, and I see your point. In the NVIDIA ecosystem, having As for the base image, I chose Debian over Alpine because NVIDIA's official binaries are built for Regarding the changes to the compose file, I’m attaching the updated services:
beszel-agent:
image: henrygd/beszel-agent-nvidia:slim
container_name: beszel-agent
restart: unless-stopped
network_mode: host
gpus: all
volumes:
- ./beszel_agent_data:/var/lib/beszel-agent
- /var/run/docker.sock:/var/run/docker.sock:ro
# If using WSL, the path might be: /usr/lib/wsl/lib/nvidia-smi
- /usr/bin/nvidia-smi:/usr/bin/nvidia-smi:ro
environment:
LISTEN: 45876
KEY: "<public key>"
HUB_URL: "<hub url>"
TOKEN: "<token>"
GPU_COLLECTOR: nvidia-smi
NVIDIA_VISIBLE_DEVICES: all
NVIDIA_DRIVER_CAPABILITIES: compute,utilityNote: Technically, the |
|
Hi @DQ-Kwon, I love the work you did for this. I think that as long as it will be on a different tag there wont be any impact for the user. Lets see what Hengry thinks about this :) |
|
I agree with your point. I’ve reverted the |
|
Hi! #2016 got me triggered to see if we could not do one image to have all the monitoring in one place. Technically with your image the Intel GPU would also work if the intel_top_gpu is mounted in the container. wondering what @henrygd thinks about this, as it would technically allow to have one image instead of 4 different ones. |
|
Hi @svenvg93, Thank you for the suggestion! I've spent some time reviewing the feasibility of an All-in-one image, and it's a very interesting concept. However, due to architectural differences between GPU vendors, implementing this may be more challenging than it initially appears. The current implementation relies heavily on the NVIDIA Container Toolkit. When a container is started with NVIDIA resources, the toolkit automatically exposes the For Intel and AMD GPUs, the situation is a bit different. While hardware access can generally be provided via To support an All-in-one image under these constraints, we would likely face a significant trade-off:
Supporting all vendors cleanly within a single image would also require broader testing and long-term maintenance across multiple GPU ecosystems and host distributions. I'm not deeply familiar with every non-NVIDIA GPU/container ecosystem, so there may be gaps or outdated assumptions in my review. I would definitely appreciate input from contributors with more experience in Intel or AMD GPU environments. For the short term, I think keeping device-specific images is the more practical approach to preserve simplicity and optimization. That said, I agree the idea is valuable, and it may be worth revisiting later as a broader long-term improvement. What are your thoughts on this? |
|
Hi @DQ-Kwon, Thanks for checking! Think the main question is what @henrygd thinks of it in long terms in terms of support. |
📃 Description
Feature #2002 Optimized the
agent-nvidiaimage for size and multi-arch support. By switching to a Distroless base and mounting nvidia-smi from the host, the image size is reduced by 75%.🪵 Changelog
➕ Added
✏️ Changed
🗑️ Removed