I have a RTX3090, and I can run Qwen3 8B with no issues.
But however this is loading, seems inefficient:
(mufun) PS D:\AITools\MuFun\demo\mufun_acestep> python .\gr_app.py
ModelManager initialized. Model will be loaded on first request.
* Running on local URL: http://127.0.0.1:7860
Received request for audio file: C:\Users\HomeStar\AppData\Local\Temp\gradio\e89fc7304bd55c990b35b7a1a7f204903815404605ee2afbf1b00c88a7425632\ace_output.wav
Loading model into VRAM...
The 8-bit optimizer is not available on your device, only available on CUDA for now.
Some weights of WhisperForConditionalGeneration were not initialized from the model checkpoint at openai/whisper-large-v3 and are newly initialized: ['proj_out.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading checkpoint shards: 0%| | 0/4 [03:38<?, ?it/s]
Error loading model: [enforce fail at alloc_cpu.cpp:121] data. DefaultCPUAllocator: not enough memory: you tried to allocate 3221225472 bytes.
Scheduling model unload in 300 seconds.
I have 128gb of system ram. 190gb including pagefile. When it tried to load my system ram usage went from 20gb to 160gb.
I have a RTX3090, and I can run Qwen3 8B with no issues.
But however this is loading, seems inefficient:
I have 128gb of system ram. 190gb including pagefile. When it tried to load my system ram usage went from 20gb to 160gb.