Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the code_local_assistant demo to use publicly available Hugging Face OpenVINO model artifacts (pulled via --source_model) and refreshes the documentation/examples accordingly.
Changes:
- Bump OVMS Windows project version constant to
2026.1.0. - Rewrite
demos/code_local_assistant/README.mdto reference OpenVINO-published HF models and updated run commands for Windows bare metal and Linux Docker. - Add a new screenshot (
vram.png) referenced from the demo README.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| windows_set_ovms_version.py | Updates the Windows OVMS project version constant to 2026.1.0. |
| demos/code_local_assistant/README.md | Reworks demo instructions to use OpenVINO public HF models and updates Continue configuration snippets. |
| demos/code_local_assistant/vram.png | Adds a new image used by the demo documentation. |
demos/code_local_assistant/README.md
Outdated
| mkdir c:\models | ||
| docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/accel --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ | ||
| openvino/model_server:weekly \ | ||
| --model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-cw-ov --task text_generation --target_device NPU --tool_parser hermes3 --rest_port 8000 --max_prompt_len 16384 --plugin_config '{"NPUW_LLM_PREFILL_ATTENTION_HINT":"PYRAMID"}' --model_name Qwen3-8B | ||
| ``` |
There was a problem hiding this comment.
This NPU Docker example starts with mkdir c:\models (Windows path). In the Linux Docker section this should be a POSIX path (e.g., mkdir -p models) consistent with the $(pwd)/models bind mount.
demos/code_local_assistant/README.md
Outdated
| mkdir c:\models | ||
| docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ | ||
| openvino/model_server:weekly \ | ||
| --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --model_name Qwen3-8B |
There was a problem hiding this comment.
This Linux Docker command passes --model_repository_path c:\models, which is a Windows path and doesn’t match the container mount (/models). This will break model downloads/loading inside the container. Please use /models (or models if that’s the intended in-container path) consistently.
| mkdir c:\models | |
| docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ | |
| openvino/model_server:weekly \ | |
| --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --model_name Qwen3-8B | |
| mkdir -p models | |
| docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ | |
| openvino/model_server:latest \ | |
| --model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --model_name Qwen3-8B |
demos/code_local_assistant/README.md
Outdated
| ```bash | ||
| docker run -d --rm --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) -e MOE_USE_MICRO_GEMM_PREFILL=0 \ | ||
| -p 8000:8000 -v $(pwd)/:/workspace/ openvino/model_server:weekly --rest_port 8000 --config_path /workspace/models/config_all.json | ||
| mkdir models | ||
| python export_model.py text_generation --source_model unsloth/Devstral-Small-2507 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --tool_parser devstral --target_device GPU | ||
| curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_devstral.jinja |
There was a problem hiding this comment.
The "Custom models" section uses python export_model.py ... but this README no longer includes any step to obtain export_model.py or install its requirements. As written, users will hit a missing-file/module error. Please add the download/install steps (or link to the canonical export_models instructions) before this example.
demos/code_local_assistant/README.md
Outdated
| mkdir c:\models | ||
| docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ | ||
| openvino/model_server:weekly \ | ||
| --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B | ||
| ``` |
There was a problem hiding this comment.
This Linux Docker example starts with mkdir c:\models (Windows path), which will fail on Linux and doesn’t match the $(pwd)/models bind mount. Please update it to create the local models/ directory (e.g., mkdir -p models).
demos/code_local_assistant/README.md
Outdated
| ```bash | ||
| python export_model.py text_generation --source_model Qwen/Qwen3-Coder-30B-A3B-Instruct --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --target_device GPU --tool_parser qwen3coder | ||
| curl -L -o models/Qwen/Qwen3-Coder-30B-A3B-Instruct/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_qwen3coder_instruct.jinja | ||
|
|
||
| docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \ | ||
| openvino/model_server:weekly \ | ||
| --add_to_config \ | ||
| --config_path /models/config_all.json \ | ||
| --model_name Qwen/Qwen3-Coder-30B-A3B-Instruct \ | ||
| --model_path Qwen/Qwen3-Coder-30B-A3B-Instruct | ||
| mkdir c:\models | ||
| set MOE_USE_MICRO_GEMM_PREFILL=0 # temporary workaround to improve accuracy with long context | ||
| ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct | ||
| ``` |
There was a problem hiding this comment.
This Windows section is labeled as a bash snippet but uses Windows CMD syntax/paths. Also, set MOE_USE_MICRO_GEMM_PREFILL=0 # ... is not valid in CMD (the # comment will be included in the value). Please switch the code fence to bat/PowerShell and format the set line/comment appropriately.
demos/code_local_assistant/README.md
Outdated
| mkdir -p models | ||
| docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ | ||
| openvino/model_server:weekly \ | ||
| --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct |
There was a problem hiding this comment.
The Linux Docker command mixes --user ... and -u ... (duplicated user flags) and uses openvino/model_server:latest while requesting --target_device GPU with /dev/dri mapped. For GPU instructions, other docs use the :latest-gpu image; please switch to the GPU image and remove the duplicate user option to avoid confusing or non-working copy/paste.
demos/code_local_assistant/README.md
Outdated
| mkdir c:\models | ||
| docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ | ||
| openvino/model_server:weekly \ | ||
| --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct |
There was a problem hiding this comment.
In the Linux section this command starts with mkdir c:\models, which is a Windows path and will fail on Linux. It also doesn’t match the bind mount that uses $(pwd)/models. Please change this to creating the local models/ directory (e.g., mkdir -p models).
rasapala
left a comment
There was a problem hiding this comment.
Please verify my liked Copilot comments.
demos/code_local_assistant/README.md
Outdated
| :sync: OpenVINO/gpt-oss-20B-int4 | ||
| ```bat | ||
| mkdir c:\models | ||
| ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int4 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20B |
There was a problem hiding this comment.
Something is off here. Here in windows part, gpt doesnt require MOE_USE_MICRO_GEMM_PREFILL env, but below, in docker linux setup you pass MOE_USE_MICRO_GEMM_PREFILL to docker container. One of those is wrong
There was a problem hiding this comment.
it is correct, only qwen3-moe models need this WA
🛠 Summary
https://openvino-doc.iotg.sclab.intel.com/hf/model-server/ovms_demos_code_completion_vsc.html
🧪 Checklist
``