Hi! π
First of all, thank you for the amazing work on ComfyUI-GGUF β it's a fantastic
project and the community really appreciates it!
I'm running ComfyUI on a NVIDIA Jetson Orin NX 16GB (ARM/aarch64) and I've been
trying to get GGUF models working. Unfortunately I'm hitting a consistent crash
that I couldn't resolve despite several attempts. I'm reporting it here hoping
it might help improve compatibility with Jetson devices, which are becoming
increasingly popular for local AI inference.
Thanks in advance for any insight! π
Environment
- Device: NVIDIA Jetson Orin NX 16GB (Engineering Reference Developer Kit Super)
- SoC: tegra234
- CUDA Arch: 8.7
- OS: Ubuntu 22.04 (aarch64)
- L4T: 36.4.7
- CUDA: 12.6.85
- cuDNN: 9.19.0.56
- TensorRT: 10.7.0.23
- Python: 3.10.12
- PyTorch: 2.5.0a0+872d972e41.nv24.08 (NVIDIA custom build)
- Also tested with: PyTorch 2.8.0 (from pypi.jetson-ai-lab.io/jp6/cu126)
- ComfyUI-GGUF: 6ea2651 (latest main)
- gguf package: 0.18.0
Model
z_image_turbo-Q4_K_M.gguf loaded via Unet Loader (GGUF) node
Error
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at
"/opt/pytorch/pytorch/c10/cuda/CUDACachingAllocator.cpp":838
With PyTorch 2.8.0 the same crash occurs at line 1131.
Traceback
The crash occurs in ops.py line 45-58, specifically when calling .to(device)
on a GGMLTensor:
File "ComfyUI-GGUF/ops.py", line 58, in to
new = super().to(*args, **kwargs)
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at
"c10/cuda/CUDACachingAllocator.cpp":838
What I tried
- Updated
gguf package from 0.17.1 to 0.18.0 β same error
- Modified
get_torch_compiler_disable_decorator() in ops.py to always
return dummy_decorator (bypass torch.compile) β same error
- Upgraded PyTorch to 2.8.0 (from Jetson AI Lab repo) β same error at
different line (1131)
- Fresh ComfyUI install with PyTorch 2.8.0 β same error
Root cause hypothesis
The CUDACachingAllocator on Jetson crashes when trying to move a custom
torch.Tensor subclass (GGMLTensor) to CUDA device. This appears to be
a known issue with PyTorch custom tensor subclasses on Jetson's unified memory
architecture (CPU+GPU share the same memory pool).
Standard safetensors models (e.g. Juggernaut XL fp16) work perfectly on the
same setup.
Question
Is there a workaround to load GGUF models without triggering .to(cuda)
on the GGMLTensor subclass? Or is there a way to force dequantization on CPU
before moving to GPU?
Hi! π
First of all, thank you for the amazing work on ComfyUI-GGUF β it's a fantastic
project and the community really appreciates it!
I'm running ComfyUI on a NVIDIA Jetson Orin NX 16GB (ARM/aarch64) and I've been
trying to get GGUF models working. Unfortunately I'm hitting a consistent crash
that I couldn't resolve despite several attempts. I'm reporting it here hoping
it might help improve compatibility with Jetson devices, which are becoming
increasingly popular for local AI inference.
Thanks in advance for any insight! π
Environment
Model
z_image_turbo-Q4_K_M.ggufloaded via Unet Loader (GGUF) nodeError
With PyTorch 2.8.0 the same crash occurs at line 1131.
Traceback
The crash occurs in
ops.pyline 45-58, specifically when calling.to(device)on a GGMLTensor:
What I tried
ggufpackage from 0.17.1 to 0.18.0 β same errorget_torch_compiler_disable_decorator()inops.pyto alwaysreturn
dummy_decorator(bypass torch.compile) β same errordifferent line (1131)
Root cause hypothesis
The
CUDACachingAllocatoron Jetson crashes when trying to move a customtorch.Tensorsubclass (GGMLTensor) to CUDA device. This appears to bea known issue with PyTorch custom tensor subclasses on Jetson's unified memory
architecture (CPU+GPU share the same memory pool).
Standard safetensors models (e.g. Juggernaut XL fp16) work perfectly on the
same setup.
Question
Is there a workaround to load GGUF models without triggering
.to(cuda)on the GGMLTensor subclass? Or is there a way to force dequantization on CPU
before moving to GPU?