Conversation
Vibe code. To be reviewed.
If in dynamic mode, load GGUF as a QT.
Refactor this to support the new reconstructability protocol in the comfy core. This is needed for DynamicVRAM (to support legacy demotion for fallbacks). Add the logic for dynamic_vram construction. This is also needed for worksplit multi-gpu branch where the model is deep-cloned via reconstruction to put the model on two parallel GPUs.
Refactor this to support the new reconstructability protocol in the comfy core. This is needed for DynamicVRAM (to support legacy demotion for fallbacks). Add the logic for dynamic_vram construction. This is also needed for worksplit multi-gpu branch where the model is deep-cloned via reconstruction to put the model on two parallel GPUs.
Factor this out to a helper and implement the new core reconstruction protocol. Consider the mmap_released flag 1:1 with the underlying model such that it moves with the base model in model_override.
|
https://github.com/rattus128/ComfyUI-GGUF/tree/dynamic-vram Is this the same thing? |
|
This version definitely has a speed boost. However, if you're getting errors with the GGUF text encoder like me, try modifying the code as follows. Only the text encoder is operating the old way. it should serve as a good temporary workaround until the update. nodes.py line 206~ (False->True) |
That means it's already working. How much% did it save you |
|
without --disable-dynamic-vram with --disable-dynamic-vram |
|
are there other cli flags needed to enable it? im on v16.4, my startup logs
have:
DynamicVRAM support detected and enabled
but when model is loaded, i don't get the same as yours:
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0
patches attached.
I'm using GGUF Wan2.2
…On Tue, Mar 17, 2026 at 8:20 AM m8rr ***@***.***> wrote:
*m8rr* left a comment (city96/ComfyUI-GGUF#427)
<#427 (comment)>
without --disable-dynamic-vram
Requested to load LTXAVTEModel_
loaded partially; 8523.00 MB usable, 556.58 MB loaded, 13574.77 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0
Attempting to release mmap (267)
loaded partially; 8457.88 MB usable, 491.46 MB loaded, 13639.97 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0
gguf qtypes: F32 (2672), BF16 (28), Q6_K (1744)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load LTXAV
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:29<00:00, 5.93s/it]
0 models unloaded.
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.72s/it]
Requested to load AudioVAE
loaded completely; 1968.11 MB usable, 693.46 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
Prompt executed in 164.28 seconds
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:24<00:00, 4.81s/it]
0 models unloaded.
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.69s/it]
Requested to load AudioVAE
loaded completely; 1934.00 MB usable, 693.46 MB loaded, full load: True
0 models unloaded.
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
Prompt executed in 90.59 seconds
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:23<00:00, 4.63s/it]
0 models unloaded.
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.68s/it]
Requested to load AudioVAE
loaded completely; 1966.00 MB usable, 693.46 MB loaded, full load: True
0 models unloaded.
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
Prompt executed in 77.65 seconds
with --disable-dynamic-vram
Requested to load LTXAVTEModel_
loaded partially; 8523.00 MB usable, 556.58 MB loaded, 13574.77 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0
Attempting to release mmap (267)
loaded partially; 8457.88 MB usable, 491.46 MB loaded, 13639.97 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0
gguf qtypes: F32 (2672), BF16 (28), Q6_K (1744)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load LTXAV
loaded partially; 9564.67 MB usable, 9525.25 MB loaded, 7689.63 MB offloaded, 39.42 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:30<00:00, 6.09s/it]
Unloaded partially: 620.48 MB freed, 8904.77 MB remains loaded, 39.42 MB buffer reserved, lowvram patches: 0
0 models unloaded.
Unloaded partially: 1287.84 MB freed, 7616.93 MB remains loaded, 39.47 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:39<00:00, 13.33s/it]
Requested to load AudioVAE
loaded completely; 2233.75 MB usable, 693.46 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
loaded partially; 0.00 MB usable, 0.00 MB loaded, 1384.94 MB offloaded, 378.02 MB buffer reserved, lowvram patches: 0
Prompt executed in 193.22 seconds
Requested to load LTXAV
loaded partially; 9560.67 MB usable, 9521.25 MB loaded, 7693.63 MB offloaded, 39.42 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:32<00:00, 6.43s/it]
Unloaded partially: 616.48 MB freed, 8904.77 MB remains loaded, 39.42 MB buffer reserved, lowvram patches: 0
0 models unloaded.
Unloaded partially: 1301.00 MB freed, 7603.77 MB remains loaded, 39.47 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:42<00:00, 14.15s/it]
Requested to load AudioVAE
loaded completely; 2244.90 MB usable, 693.46 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
loaded partially; 0.00 MB usable, 0.00 MB loaded, 1384.94 MB offloaded, 378.02 MB buffer reserved, lowvram patches: 0
Prompt executed in 97.68 seconds
Requested to load LTXAV
loaded partially; 9560.67 MB usable, 9521.25 MB loaded, 7693.63 MB offloaded, 39.42 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:31<00:00, 6.25s/it]
Unloaded partially: 616.48 MB freed, 8904.77 MB remains loaded, 39.42 MB buffer reserved, lowvram patches: 0
0 models unloaded.
Unloaded partially: 1301.00 MB freed, 7603.77 MB remains loaded, 39.47 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:41<00:00, 13.87s/it]
Requested to load AudioVAE
loaded completely; 2244.90 MB usable, 693.46 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
loaded partially; 0.00 MB usable, 0.00 MB loaded, 1384.94 MB offloaded, 378.02 MB buffer reserved, lowvram patches: 0
Prompt executed in 95.82 seconds
—
Reply to this email directly, view it on GitHub
<#427 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGD6KSN4OMENNNQ5OMLYXL4RCK5ZAVCNFSM6AAAAACWIURSQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DANZRGQ4DEMJTHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
|
Check if the quant_ops.py file exists inside the ComfyUI-GGUF folder. If it’s not there, the installation wasn't done correctly. I installed like this. |
|
EDIT: Sorry to bother. It works now after nuking my whole comfyui installation. That's weird, i did exactly that but it's still not activating. I have that file: But when loading the GGUF Q4KM Wan2.2, i still can't see the Dynamic loading log: I know dynamic loading is enabled in Comfyui because other models have that log: I tried this both in v0.16.4 and the latest Comfy v.0.18.2 EDIT: Sorry to bother. It works now after nuking my whole comfyui installation. |
|
I ran a few tests win Wan2.2 Q4KM GGUF. It seems that GGUF Dynamic VRAM is slower than non-dynamic: Dynamic VRAM: Without Dynamic VRAM: |
|
I ran Wan2.2 three times. Unlike LTX2.3, there wasn't a huge difference, but it doesn't seem any slower either. |
|
I hope this doesn't get merged, or always keeps a proper way to keep this disabled. Even using I also don't really get if it's needed in the GGUF loader? It was already pretty good and smart about offloading. Let's be clear (before I have to explain or edit this post): On my RTX Blackwell system, it's pretty neat. It's enabling workflows that I had to memory-juggle manually before or just gave OOM errors. It's also increased processing time for other workflows - most often the workflows that are simpler and using smaller models - because it keeps unloading and loading parts that otherwise fitted nicely in memory. So, the dynamic-vram work - currently - isn't a silver bullet and I still see lots of issues reported with it, and the thought that ComfyUI maintainers say they are going to remove the disable option (that works for 90% or less) makes me a bit scared. With the dynamic vram work activated, on my Intel system ComfyUI is now completely unusable. everything makes my system go OOM and the OS kills ComfyUI. Switching to the GGUF loader (even with Q8 and F16 models) and everything is fine. It ignores Until dynamic-vram in ComfyUI is stable and people are happy with it, please leave the GGUF loader alone. It's the only fallback. |
The new dynamic VRAM system in the comfy-core enhances both RAM and VRAM management. Models are no longer offloader from VRAM to RAM (which has a habit of becoming swap) and are now loadable asynchronously on the sampler first iteration. This gives significant speedup to big multi-model workflows on low-resource systems. VRAM offloading is managed by demand offloading, such there is no need to have VRAM usage esitmates anymore.
The core has already upstreamed several of the resource saving features of GGUF in various forms.
So this implements a QuantizedTensor backend and subclasses the new ModelPatcherDynamic to bring GGUF+dynamic without needed custom ops.
The patcher subclass is needed to unhook the lora into on-the-fly. Otherwise its just load the state-dict into the new QuantizedTensor and go.
This brings the full feature-set of the core comfy caster to GGUF including, async-offload (and async primary load), pinned-memory and now the dynamic management.
There's some boilerplate to implement downgrade back to ModelPatcher. This is needed for things like torch compiler and hooks where Dynamic VRAM is TBD.
Still drafing and will post some more performance results. I am going to pull a RAM stick and go for some 16GB RAM flows with GGUF.
Example Test conditions:
WAN2.2 14B Q8 GGUF, 640x640x81f, RTX5090, Linux, 96GB, 2x Runs (disk caches warm with model first runs)
Before
After