Skip to content

update continue demo to use public HF models#4024

Open
dtrawins wants to merge 7 commits intomainfrom
hf_models
Open

update continue demo to use public HF models#4024
dtrawins wants to merge 7 commits intomainfrom
hf_models

Conversation

@dtrawins
Copy link
Collaborator

@dtrawins dtrawins commented Feb 27, 2026

🛠 Summary

https://openvino-doc.iotg.sclab.intel.com/hf/model-server/ovms_demos_code_completion_vsc.html

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the code_local_assistant demo to use publicly available Hugging Face OpenVINO model artifacts (pulled via --source_model) and refreshes the documentation/examples accordingly.

Changes:

  • Bump OVMS Windows project version constant to 2026.1.0.
  • Rewrite demos/code_local_assistant/README.md to reference OpenVINO-published HF models and updated run commands for Windows bare metal and Linux Docker.
  • Add a new screenshot (vram.png) referenced from the demo README.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 11 comments.

File Description
windows_set_ovms_version.py Updates the Windows OVMS project version constant to 2026.1.0.
demos/code_local_assistant/README.md Reworks demo instructions to use OpenVINO public HF models and updates Continue configuration snippets.
demos/code_local_assistant/vram.png Adds a new image used by the demo documentation.

Comment on lines 131 to 135
mkdir c:\models
docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/accel --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
openvino/model_server:weekly \
--model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-cw-ov --task text_generation --target_device NPU --tool_parser hermes3 --rest_port 8000 --max_prompt_len 16384 --plugin_config '{"NPUW_LLM_PREFILL_ATTENTION_HINT":"PYRAMID"}' --model_name Qwen3-8B
```
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This NPU Docker example starts with mkdir c:\models (Windows path). In the Linux Docker section this should be a POSIX path (e.g., mkdir -p models) consistent with the $(pwd)/models bind mount.

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +124
mkdir c:\models
docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
openvino/model_server:weekly \
--model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --model_name Qwen3-8B
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Linux Docker command passes --model_repository_path c:\models, which is a Windows path and doesn’t match the container mount (/models). This will break model downloads/loading inside the container. Please use /models (or models if that’s the intended in-container path) consistently.

Suggested change
mkdir c:\models
docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
openvino/model_server:weekly \
--model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --model_name Qwen3-8B
mkdir -p models
docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
openvino/model_server:latest \
--model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --model_name Qwen3-8B

Copilot uses AI. Check for mistakes.
Comment on lines +145 to +148
```bash
docker run -d --rm --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) -e MOE_USE_MICRO_GEMM_PREFILL=0 \
-p 8000:8000 -v $(pwd)/:/workspace/ openvino/model_server:weekly --rest_port 8000 --config_path /workspace/models/config_all.json
mkdir models
python export_model.py text_generation --source_model unsloth/Devstral-Small-2507 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --tool_parser devstral --target_device GPU
curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_devstral.jinja
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Custom models" section uses python export_model.py ... but this README no longer includes any step to obtain export_model.py or install its requirements. As written, users will hit a missing-file/module error. Please add the download/install steps (or link to the canonical export_models instructions) before this example.

Copilot uses AI. Check for mistakes.
Comment on lines 110 to 114
mkdir c:\models
docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
openvino/model_server:weekly \
--model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B
```
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Linux Docker example starts with mkdir c:\models (Windows path), which will fail on Linux and doesn’t match the $(pwd)/models bind mount. Please update it to create the local models/ directory (e.g., mkdir -p models).

Copilot uses AI. Check for mistakes.
Comment on lines 17 to 21
```bash
python export_model.py text_generation --source_model Qwen/Qwen3-Coder-30B-A3B-Instruct --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --target_device GPU --tool_parser qwen3coder
curl -L -o models/Qwen/Qwen3-Coder-30B-A3B-Instruct/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_qwen3coder_instruct.jinja

docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \
openvino/model_server:weekly \
--add_to_config \
--config_path /models/config_all.json \
--model_name Qwen/Qwen3-Coder-30B-A3B-Instruct \
--model_path Qwen/Qwen3-Coder-30B-A3B-Instruct
mkdir c:\models
set MOE_USE_MICRO_GEMM_PREFILL=0 # temporary workaround to improve accuracy with long context
ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct
```
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Windows section is labeled as a bash snippet but uses Windows CMD syntax/paths. Also, set MOE_USE_MICRO_GEMM_PREFILL=0 # ... is not valid in CMD (the # comment will be included in the value). Please switch the code fence to bat/PowerShell and format the set line/comment appropriately.

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +80
mkdir -p models
docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
openvino/model_server:weekly \
--model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Linux Docker command mixes --user ... and -u ... (duplicated user flags) and uses openvino/model_server:latest while requesting --target_device GPU with /dev/dri mapped. For GPU instructions, other docs use the :latest-gpu image; please switch to the GPU image and remove the duplicate user option to avoid confusing or non-working copy/paste.

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +91
mkdir c:\models
docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
openvino/model_server:weekly \
--model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Linux section this command starts with mkdir c:\models, which is a Windows path and will fail on Linux. It also doesn’t match the bind mount that uses $(pwd)/models. Please change this to creating the local models/ directory (e.g., mkdir -p models).

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@rasapala rasapala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify my liked Copilot comments.

:sync: OpenVINO/gpt-oss-20B-int4
```bat
mkdir c:\models
ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int4 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20B
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is off here. Here in windows part, gpt doesnt require MOE_USE_MICRO_GEMM_PREFILL env, but below, in docker linux setup you pass MOE_USE_MICRO_GEMM_PREFILL to docker container. One of those is wrong

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is correct, only qwen3-moe models need this WA

dtrawins and others added 3 commits March 2, 2026 22:51
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants