Got below error while trying to serve Qwen/Qwen3-VL-30B-A3B-Instruct model:
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 508, in init
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] self.language_model = Qwen3MoeLLMForCausalLM(
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 414, in init
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] if self.config.tie_word_embeddings:
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 164, in getattribute
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] return super().getattribute(key)
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'tie_word_embeddings'
(Worker_TP0 pid=1639) INFO 05-20 08:44:10 [multiproc_executor.py:707] Parent process exited, terminating worker
Commands used:
docker pull intel/llm-scaler-vllm:latest
sudo docker run -td
--privileged
--net=host
--device=/dev/dri
--name=qwen3vl_container
-v llm:/llm/models/
-e no_proxy=localhost,127.0.0.1
-e http_proxy=$http_proxy
-e https_proxy=$https_proxy
--shm-size="32g"
--entrypoint /bin/bash
intel/llm-scaler-vllm:latest
docker exec -it qwen3vl_container bash
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_WORKER_MULTIPROC_METHOD=spawn vllm serve --model /llm/models/Qwen3-VL-30B-A3B-Instruct --served-model-name Qwen3-VL-30B-A3B-Instruct --dtype=float16 --enforce-eager --port 8000 --host 0.0.0.0 --trust-remote-code --disable-sliding-window --gpu-memory-util=0.9 --max-num-batched-tokens=8192 --disable-log-requests --max-model-len=8192 --block-size 64 -tp=4 2>&1 | tee vllm.log
Attached the detailed log.
Qwen3-VL-30B-A3B-Instruct_error.txt
Got below error while trying to serve Qwen/Qwen3-VL-30B-A3B-Instruct model:
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 508, in init
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] self.language_model = Qwen3MoeLLMForCausalLM(
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl_moe.py", line 414, in init
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] if self.config.tie_word_embeddings:
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 164, in getattribute
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] return super().getattribute(key)
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1639) ERROR 05-20 08:44:10 [multiproc_executor.py:749] AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'tie_word_embeddings'
(Worker_TP0 pid=1639) INFO 05-20 08:44:10 [multiproc_executor.py:707] Parent process exited, terminating worker
Commands used:
docker pull intel/llm-scaler-vllm:latest
sudo docker run -td
--privileged
--net=host
--device=/dev/dri
--name=qwen3vl_container
-v llm:/llm/models/
-e no_proxy=localhost,127.0.0.1
-e http_proxy=$http_proxy
-e https_proxy=$https_proxy
--shm-size="32g"
--entrypoint /bin/bash
intel/llm-scaler-vllm:latest
docker exec -it qwen3vl_container bash
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_WORKER_MULTIPROC_METHOD=spawn vllm serve --model /llm/models/Qwen3-VL-30B-A3B-Instruct --served-model-name Qwen3-VL-30B-A3B-Instruct --dtype=float16 --enforce-eager --port 8000 --host 0.0.0.0 --trust-remote-code --disable-sliding-window --gpu-memory-util=0.9 --max-num-batched-tokens=8192 --disable-log-requests --max-model-len=8192 --block-size 64 -tp=4 2>&1 | tee vllm.log
Attached the detailed log.
Qwen3-VL-30B-A3B-Instruct_error.txt