update continue demo to use public HF models#4024

Open

dtrawins wants to merge 7 commits intomainfrom

Collaborator

dtrawins commented Feb 27, 2026 •

edited

Loading

🛠 Summary

https://openvino-doc.iotg.sclab.intel.com/hf/model-server/ovms_demos_code_completion_vsc.html

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``


          update continue demo to use public HF models

cc96c5c

dtrawins requested review from Copilot, dkalinowski, mzegla and ngrozae

February 27, 2026 15:33

Copilot started reviewing on behalf of dtrawins

February 27, 2026 15:34


          image weekly

5970fcf

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR updates the code_local_assistant demo to use publicly available Hugging Face OpenVINO model artifacts (pulled via --source_model) and refreshes the documentation/examples accordingly.

Changes:

Bump OVMS Windows project version constant to 2026.1.0.
Rewrite demos/code_local_assistant/README.md to reference OpenVINO-published HF models and updated run commands for Windows bare metal and Linux Docker.
Add a new screenshot (vram.png) referenced from the demo README.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 11 comments.

File	Description
windows_set_ovms_version.py	Updates the Windows OVMS project version constant to 2026.1.0.
demos/code_local_assistant/README.md	Reworks demo instructions to use OpenVINO public HF models and updates Continue configuration snippets.
demos/code_local_assistant/vram.png	Adds a new image used by the demo documentation.

demos/code_local_assistant/README.md Outdated

Comment on lines 131 to 135

+              mkdir c:\models
+              docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/accel --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
+                  openvino/model_server:weekly \
+                  --model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-cw-ov --task text_generation --target_device NPU --tool_parser hermes3 --rest_port 8000 --max_prompt_len 16384 --plugin_config '{"NPUW_LLM_PREFILL_ATTENTION_HINT":"PYRAMID"}' --model_name Qwen3-8B
               ```

Copilot AI Feb 27, 2026

This NPU Docker example starts with mkdir c:\models (Windows path). In the Linux Docker section this should be a POSIX path (e.g., mkdir -p models) consistent with the $(pwd)/models bind mount.

Copilot uses AI. Check for mistakes.

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

demos/code_local_assistant/README.md Outdated

Comment on lines +121 to +124

+              mkdir c:\models
+              docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
+                  openvino/model_server:weekly \
+                  --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000  --model_name Qwen3-8B

Copilot AI Feb 27, 2026

This Linux Docker command passes --model_repository_path c:\models, which is a Windows path and doesn’t match the container mount (/models). This will break model downloads/loading inside the container. Please use /models (or models if that’s the intended in-container path) consistently.

Suggested change

      
            mkdir c:\models
          
            docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
          
                openvino/model_server:weekly \
          
                --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000  --model_name Qwen3-8B
          
            mkdir -p models
          
            docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
          
                openvino/model_server:latest \
          
                --model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000  --model_name Qwen3-8B

Copilot uses AI. Check for mistakes.

demos/code_local_assistant/README.md Outdated

Comment on lines +145 to +148

               ```bash
-              docker run -d --rm --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) -e MOE_USE_MICRO_GEMM_PREFILL=0 \
-                -p 8000:8000 -v $(pwd)/:/workspace/ openvino/model_server:weekly --rest_port 8000 --config_path /workspace/models/config_all.json
+              mkdir models
+              python export_model.py text_generation --source_model unsloth/Devstral-Small-2507 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --tool_parser devstral --target_device GPU
+              curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_devstral.jinja

Copilot AI Feb 27, 2026

The "Custom models" section uses python export_model.py ... but this README no longer includes any step to obtain export_model.py or install its requirements. As written, users will hit a missing-file/module error. Please add the download/install steps (or link to the canonical export_models instructions) before this example.

Copilot uses AI. Check for mistakes.

demos/code_local_assistant/README.md Outdated

Comment on lines 110 to 114

+              mkdir c:\models
+              docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
+                  openvino/model_server:weekly \
+                  --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B
               ```

Copilot AI Feb 27, 2026

This Linux Docker example starts with mkdir c:\models (Windows path), which will fail on Linux and doesn’t match the $(pwd)/models bind mount. Please update it to create the local models/ directory (e.g., mkdir -p models).

Copilot uses AI. Check for mistakes.

demos/code_local_assistant/README.md Outdated

Comment on lines 17 to 21

               ```bash
-              python export_model.py text_generation --source_model Qwen/Qwen3-Coder-30B-A3B-Instruct --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --target_device GPU --tool_parser qwen3coder
-              curl -L -o models/Qwen/Qwen3-Coder-30B-A3B-Instruct/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_qwen3coder_instruct.jinja
-              docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \
-                  openvino/model_server:weekly \
-                  --add_to_config \
-                  --config_path /models/config_all.json \
-                  --model_name Qwen/Qwen3-Coder-30B-A3B-Instruct \
-                  --model_path Qwen/Qwen3-Coder-30B-A3B-Instruct
+              mkdir c:\models
+              set MOE_USE_MICRO_GEMM_PREFILL=0  # temporary workaround to improve accuracy with long context
+              ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct
               ```

Copilot AI Feb 27, 2026

This Windows section is labeled as a bash snippet but uses Windows CMD syntax/paths. Also, set MOE_USE_MICRO_GEMM_PREFILL=0 # ... is not valid in CMD (the # comment will be included in the value). Please switch the code fence to bat/PowerShell and format the set line/comment appropriately.

Copilot uses AI. Check for mistakes.

demos/code_local_assistant/README.md Outdated

Comment on lines +77 to +80

+              mkdir -p models
+              docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
+                  openvino/model_server:weekly \
+                  --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct

Copilot AI Feb 27, 2026

The Linux Docker command mixes --user ... and -u ... (duplicated user flags) and uses openvino/model_server:latest while requesting --target_device GPU with /dev/dri mapped. For GPU instructions, other docs use the :latest-gpu image; please switch to the GPU image and remove the duplicate user option to avoid confusing or non-working copy/paste.

Copilot uses AI. Check for mistakes.

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

demos/code_local_assistant/README.md Outdated

Comment on lines +88 to +91

+              mkdir c:\models
+              docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
+                  openvino/model_server:weekly \
+                  --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct

Copilot AI Feb 27, 2026

In the Linux section this command starts with mkdir c:\models, which is a Windows path and will fail on Linux. It also doesn’t match the bind mount that uses $(pwd)/models. Please change this to creating the local models/ directory (e.g., mkdir -p models).

Copilot uses AI. Check for mistakes.


          review

faccd50

rasapala reviewed

View reviewed changes

Collaborator

rasapala left a comment

Please verify my liked Copilot comments.

dkalinowski reviewed

View reviewed changes

demos/code_local_assistant/README.md Outdated

+              :sync: OpenVINO/gpt-oss-20B-int4
+              ```bat
+              mkdir c:\models
+              ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int4 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20B

Collaborator

dkalinowski Mar 2, 2026

Something is off here. Here in windows part, gpt doesnt require MOE_USE_MICRO_GEMM_PREFILL env, but below, in docker linux setup you pass MOE_USE_MICRO_GEMM_PREFILL to docker container. One of those is wrong

Collaborator Author

dtrawins Mar 2, 2026

it is correct, only qwen3-moe models need this WA

ngrozae reviewed

View reviewed changes

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

dtrawins commented

View reviewed changes

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

dtrawins commented

View reviewed changes

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

dtrawins commented

View reviewed changes

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

dtrawins commented

View reviewed changes

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

dtrawins commented

View reviewed changes

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

dtrawins commented

View reviewed changes

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

dtrawins commented

View reviewed changes

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

dtrawins commented

View reviewed changes

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

dtrawins commented

View reviewed changes

demos/code_local_assistant/README.md Outdated Show resolved Hide resolved

dtrawins and others added 3 commits

March 2, 2026 22:51


          Apply suggestions from code review

e15aff1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>


          fixes

4d4a599


          fixes

111b71b

dkalinowski approved these changes

View reviewed changes

rasapala approved these changes

View reviewed changes


          remove gpt-oss-int8

ed1e9ba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet