Describe the bug
It seems it is not possible to serve Qwen 3.5-0.8B (a natively Vision-language model) out of the box via the python high level API.
There seems to be missing a param in the python onnx export of the visual part of the model in the engine.py file.
I was able to correct and serve Qwen3.5-0.8B via teh high level API it in this way:
diff --git a/experimental/server/engine.py b/experimental/server/engine.py
index e768751..df7eb07 100644
--- a/experimental/server/engine.py
+++ b/experimental/server/engine.py
@@ -522,12 +522,14 @@ class LLM:
import torch
_ensure_export_package()
+ from tensorrt_edgellm.config import ModelConfig
from tensorrt_edgellm.scripts.export import (_export_visual,
_load_all_weights,
_load_config)
config = _load_config(self._model_dir)
weights = _load_all_weights(self._model_dir)
+ model_config = ModelConfig.from_pretrained(self._model_dir)
_export_visual(
self._model_dir,
self._visual_onnx_dir,
@@ -535,6 +537,7 @@ class LLM:
config,
self._model_type,
torch.float16,
+ model_config=model_config,
)
logger.info(
"Visual ONNX export complete: %s",
Let me know if you agree this is a bug or maybe i am missing something.
I can also open a PR with the fix after.
Steps/Code to reproduce bug
(venv) root@217da7dcd3f2:/workspace# python -m experimental.server --model Qwen/Qwen3.5-0.8B --port 8000
10:20:56 INFO edgellm.server: Resolving model: Qwen/Qwen3.5-0.8B
10:20:56 INFO edgellm.server: Downloading Qwen/Qwen3.5-0.8B from Hugging Face Hub ...
10:20:57 INFO httpx: HTTP Request: GET https://huggingface.co/api/models/Qwen/Qwen3.5-0.8B/revision/main "HTTP/1.1 200 OK"
Fetching 13 files: 100%|████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 14540.25it/s]
Download complete: : 0.00B [00:00, ?B/s] 10:20:57 INFO edgellm.server: Detected VLM model (type=qwen3_5)/13 [00:00<?, ?it/s]
10:20:57 INFO edgellm.server: Using cached ONNX: /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/2fc06364715b967f1860aea9cf38778875588b17/.edgellm/onnx/llm
10:20:57 INFO edgellm.server: Exporting visual ONNX to /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/2fc06364715b967f1860aea9cf38778875588b17/.edgellm/onnx/visual ...
Download complete: : 0.00B [00:00, ?B/s]
10:20:58 INFO tensorrt_edgellm.scripts.export: Loading shard: model.safetensors-00001-of-00001.safetensors
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/workspace/experimental/server/main.py", line 19, in
main()
File "/workspace/experimental/server/api_server.py", line 479, in main
llm = LLM(
^^^^
File "/workspace/experimental/server/engine.py", line 317, in init
self._init_from_model(
File "/workspace/experimental/server/engine.py", line 432, in _init_from_model
self._export_visual_onnx()
File "/workspace/experimental/server/engine.py", line 531, in _export_visual_onnx
_export_visual(
TypeError: _export_visual() missing 1 required positional argument: 'model_config'
(venv) root@217da7dcd3f2:/workspace#
Installation method:
Following teh instruction here, installing venv via pip https://nvidia.github.io/TensorRT-Edge-LLM/latest/user_guide/getting_started/installation.html#installation
Export command used:
python -m experimental.server --model Qwen/Qwen3.5-0.8B --port 8000
Expected behavior
Capable of Serving via the high level API a VLM model.
System information (x86 Host with GPU)
This is indipendent from the system.
Describe the bug
It seems it is not possible to serve Qwen 3.5-0.8B (a natively Vision-language model) out of the box via the python high level API.
There seems to be missing a param in the python onnx export of the visual part of the model in the engine.py file.
I was able to correct and serve Qwen3.5-0.8B via teh high level API it in this way:
Let me know if you agree this is a bug or maybe i am missing something.
I can also open a PR with the fix after.
Steps/Code to reproduce bug
Installation method:
Following teh instruction here, installing venv via pip https://nvidia.github.io/TensorRT-Edge-LLM/latest/user_guide/getting_started/installation.html#installation
Export command used:
Expected behavior
Capable of Serving via the high level API a VLM model.
System information (x86 Host with GPU)
This is indipendent from the system.