From cc96c5c4456d47fc0ae9f19811d5a033310ee775 Mon Sep 17 00:00:00 2001 From: Dariusz Trawinski Date: Fri, 27 Feb 2026 16:32:08 +0100 Subject: [PATCH 1/7] update continue demo to use public HF models --- demos/code_local_assistant/README.md | 400 ++++++++------------------- demos/code_local_assistant/vram.png | Bin 0 -> 25648 bytes windows_set_ovms_version.py | 2 +- 3 files changed, 119 insertions(+), 283 deletions(-) create mode 100644 demos/code_local_assistant/vram.png diff --git a/demos/code_local_assistant/README.md b/demos/code_local_assistant/README.md index 875cf6ac0e..1cfb37ca57 100644 --- a/demos/code_local_assistant/README.md +++ b/demos/code_local_assistant/README.md @@ -6,242 +6,152 @@ With the rise of AI PC capabilities, hosting own Visual Studio code assistant is # Requirements - Windows (for standalone app) or Linux (using Docker) - Python installed (for model preparation only) -- Intel Meteor Lake, Lunar Lake, Arrow Lake or newer Intel CPU. +- Intel Meteor Lake, Lunar Lake, Arrow Lake or Panter Lake. +- Memory requirements depend on the model size -## Prepare Code Chat/Edit Model -We need to use medium size model to get reliable responses but also to fit it to the available memory on the host or discrete GPU. - -Download export script, install its dependencies and create directory for the models: -```console -curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py -pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt -mkdir models -``` -> **Note:** The users in China need to set environment variable HF_ENDPOINT="https://hf-mirror.com" before running the export script to connect to the HF Hub. - -Pull and add the model on Linux: +### Windows: deploying on bare metal -> **Note:** To use CPU, please export model with option `--target_device CPU` instead of `GPU`. ::::{tab-set} -:::{tab-item} Qwen/Qwen3-Coder-30B-A3B-Instruct -:sync: Qwen/Qwen3-Coder-30B-A3B-Instruct +:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 +:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 ```bash -python export_model.py text_generation --source_model Qwen/Qwen3-Coder-30B-A3B-Instruct --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --target_device GPU --tool_parser qwen3coder -curl -L -o models/Qwen/Qwen3-Coder-30B-A3B-Instruct/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_qwen3coder_instruct.jinja - -docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \ - openvino/model_server:weekly \ - --add_to_config \ - --config_path /models/config_all.json \ - --model_name Qwen/Qwen3-Coder-30B-A3B-Instruct \ - --model_path Qwen/Qwen3-Coder-30B-A3B-Instruct +mkdir c:\models +set MOE_USE_MICRO_GEMM_PREFILL=0 # temporary workaround to improve accuracy with long context +ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct ``` -> **Note:** For deployment, the model requires ~16GB disk space and recommended 19GB+ of VRAM on the GPU. For conversion, the original model will be pulled and quantized, which requires 65GB of free RAM. - +> **Note:** For deployment, the model requires ~16GB disk space and recommended 19GB+ of VRAM on the GPU. ::: -:::{tab-item} mistralai/Codestral-22B-v0.1 -:sync: mistralai/Codestral-22B-v0.1 + +:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 +:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 ```bash -python export_model.py text_generation --source_model mistralai/Codestral-22B-v0.1 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --target_device GPU -curl -L -o models/mistralai/Codestral-22B-v0.1/chat_template.jinja https://raw.githubusercontent.com/vllm-project/vllm/refs/tags/v0.10.1.1/examples/tool_chat_template_mistral_parallel.jinja - -docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \ - openvino/model_server:weekly \ - --add_to_config \ - --config_path /models/config_all.json \ - --model_name mistralai/Codestral-22B-v0.1 \ - --model_path mistralai/Codestral-22B-v0.1 +mkdir c:\models +set MOE_USE_MICRO_GEMM_PREFILL=0 # temporary workaround to improve accuracy with long context +ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct ``` -> **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. For conversion, the original model will be pulled and quantized, which requires 50GB of free RAM. - +> **Note:** For deployment, the model requires ~16GB disk space and recommended 34GB+ of VRAM on the GPU. ::: -:::{tab-item} openai/gpt-oss-20b -:sync: openai/gpt-oss-20b + +:::{tab-item} OpenVINO/gpt-oss-20B-int4 +:sync: OpenVINO/gpt-oss-20B-int4 ```bash -python export_model.py text_generation --source_model openai/gpt-oss-20b --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --tool_parser gptoss --reasoning_parser gptoss --target_device GPU -curl -L -o models/openai/gpt-oss-20b/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_gpt_oss.jinja - -docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \ - openvino/model_server:weekly \ - --add_to_config \ - --config_path /models/config_all.json \ - --model_name openai/gpt-oss-20b \ - --model_path openai/gpt-oss-20b +mkdir c:\models +ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int4 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20B ``` - -> **Note:** Continuous batching and paged attention are supported for GPT‑OSS. However, when deployed on GPU, the model may experience reduced accuracy under high‑concurrency workloads. This issue will be resolved in version 2026.1 and in the upcoming weekly release. CPU execution is not affected. -> **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. For conversion, the original model will be pulled and quantized, which requires 96GB of free RAM. - +> **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. ::: -:::{tab-item} unsloth/Devstral-Small-2507 -:sync: unsloth/Devstral-Small-2507 -```bash -python export_model.py text_generation --source_model unsloth/Devstral-Small-2507 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --tool_parser devstral --target_device GPU -curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_devstral.jinja -docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \ - openvino/model_server:weekly \ - --add_to_config \ - --config_path /models/config_all.json \ - --model_name unsloth/Devstral-Small-2507 \ - --model_path unsloth/Devstral-Small-2507 +:::{tab-item} OpenVINO/gpt-oss-20B-int8 +:sync: OpenVINO/gpt-oss-20B-int8 +```bash +mkdir c:\models +ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20B ``` -> **Note:** For deployment, the model requires ~13GB disk space and recommended 16GB+ of VRAM on the GPU. For conversion, the original model will be pulled and quantized, which requires 50GB of free RAM. - +> **Note:** For deployment, the model requires ~24GB disk space and recommended 28GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/Qwen3-4B-int4-ov -:sync: OpenVINO/Qwen3-4B-int4-ov + +:::{tab-item} OpenVINO/Qwen3-8B-int4-ov +:sync: OpenVINO/Qwen3-8B-int4-ov ```bash -docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \ - openvino/model_server:weekly \ - --pull \ - --source_model OpenVINO/Qwen3-4B-int4-ov \ - --model_repository_path /models \ - --model_name OpenVINO/Qwen3-4B-int4-ov \ - --task text_generation \ - --tool_parser hermes3 \ - --target_device GPU - -docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \ - openvino/model_server:weekly \ - --add_to_config --config_path /models/config_all.json \ - --model_name OpenVINO/Qwen3-4B-int4-ov \ - --model_path OpenVINO/Qwen3-4B-int4-ov +mkdir c:\models +ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-8B ``` -> **Note:** `Qwen3` models are available on [HuggingFace OpenVINO repository](https://huggingface.co/OpenVINO/models?search=qwen3) in different sizes and precisions. It is possible to choose it for any use and hardware. -RAM requirements depends on the model quantization. +> **Note:** For deployment, the model requires ~4GB disk space and recommended 6GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov -:sync: OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov +:::{tab-item} OpenVINO/Qwen3-8B-int4-cw-ov +:sync: OpenVINO/Qwen3-8B-int4-cw-ov ```bash -docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \ - openvino/model_server:weekly \ - --pull \ - --source_model OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov \ - --model_repository_path /models \ - --model_name OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov \ - --task text_generation \ - --target_device GPU - -docker run -d --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw \ - openvino/model_server:weekly \ - --add_to_config \ - --config_path /models/config_all.json \ - --model_name OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov \ - --model_path OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov +mkdir c:\models +ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-cw-ov --task text_generation --target_device NPU --tool_parser hermes3 --rest_port 8000 --max_prompt_len 16384 --plugin_config "{\"NPUW_LLM_PREFILL_ATTENTION_HINT\":\"PYRAMID\"}" --cache_dir .ovcache --model_name Qwen3-8B ``` - -> **Note:** `Qwen2.5-Coder` models are available on [HuggingFace OpenVINO repository](https://huggingface.co/OpenVINO/models?search=qwen2.5-coder) in different sizes and precisions. It is possible to choose it for any use and hardware. -RAM requirements depends on the model quantization. - +> **Note:** First model initialization might be long. With the compilation cache, sequential model loading will be fast. ::: :::: -Pull and add the model on Windows: -::::{tab-set} -:::{tab-item} Qwen/Qwen3-Coder-30B-A3B-Instruct -:sync: Qwen/Qwen3-Coder-30B-A3B-Instruct -```bat -python export_model.py text_generation --source_model Qwen/Qwen3-Coder-30B-A3B-Instruct --weight-format int8 --config_file_path models/config_all.json --model_repository_path models --target_device GPU --tool_parser qwen3coder -curl -L -o models/Qwen/Qwen3-Coder-30B-A3B-Instruct/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_qwen3coder_instruct.jinja +### Linux: via Docker -ovms.exe --add_to_config --config_path models/config_all.json --model_name Qwen/Qwen3-Coder-30B-A3B-Instruct --model_path Qwen/Qwen3-Coder-30B-A3B-Instruct +::::{tab-set} +:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 +:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 +```bash +mkdir -p models +docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ + openvino/model_server:latest \ + --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct ``` -> **Note:** For deployment, the model requires ~16GB disk space and recommended 19GB+ of VRAM on the GPU. For conversion, the original model will be pulled and quantized, which requires 65GB of free RAM. - +> **Note:** For deployment, the model requires ~16GB disk space and recommended 19GB+ of VRAM on the GPU. ::: -:::{tab-item} mistralai/Codestral-22B-v0.1 -:sync: mistralai/Codestral-22B-v0.1 -```bat -python export_model.py text_generation --source_model mistralai/Codestral-22B-v0.1 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --target_device GPU -curl -L -o models/mistralai/Codestral-22B-v0.1/chat_template.jinja https://raw.githubusercontent.com/vllm-project/vllm/refs/tags/v0.10.1.1/examples/tool_chat_template_mistral_parallel.jinja - -ovms.exe --add_to_config --config_path models/config_all.json --model_name mistralai/Codestral-22B-v0.1 --model_path mistralai/Codestral-22B-v0.1 +:::{tab-item} OpenVINO/QwCoder-30B-A3B-Instruct-int8 +:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 +```bash +mkdir c:\models +docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ + openvino/model_server:latest \ + --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct ``` -> **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. For conversion, the original model will be pulled and quantized, which requires 50GB of free RAM. - - +> **Note:** For deployment, the model requires ~16GB disk space and recommended 34GB+ of VRAM on the GPU. ::: -:::{tab-item} openai/gpt-oss-20b -:sync: openai/gpt-oss-20b -```bat -python export_model.py text_generation --source_model openai/gpt-oss-20b --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --target_device GPU --tool_parser gptoss --reasoning_parser gptoss -curl -L -o models/openai/gpt-oss-20b/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_gpt_oss.jinja -ovms.exe --add_to_config --config_path models/config_all.json --model_name openai/gpt-oss-20b --model_path openai/gpt-oss-20b +:::{tab-item} OpenVINO/gpt-oss-20B-int4 +:sync: OpenVINO/gpt-oss-20B-int4 +```bash +mkdir c:\models +docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ + openvino/model_server:latest \ + --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int4 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B ``` -> **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. For conversion, the original model will be pulled and quantized, which requires 96GB of free RAM. -> **Note:** Continuous batching and paged attention are supported for GPT‑OSS. However, when deployed on GPU, the model may experience reduced accuracy under high‑concurrency workloads. This issue will be resolved in version 2026.1 and in the upcoming weekly release. CPU execution is not affected. - +> **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. ::: -:::{tab-item} unsloth/Devstral-Small-2507 -:sync: unsloth/Devstral-Small-2507 -```bat -python export_model.py text_generation --source_model unsloth/Devstral-Small-2507 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --tool_parser devstral --target_device GPU -curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_devstral.jinja -ovms.exe --add_to_config --config_path models/config_all.json --model_name unsloth/Devstral-Small-2507 --model_path unsloth/Devstral-Small-2507 +:::{tab-item} OpenVINO/gpt-oss-20B-int8 +:sync: OpenVINO/gpt-oss-20B-int8 +```bash +mkdir c:\models +docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ + openvino/model_server:latest \ + --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B ``` -> **Note:** For deployment, the model requires ~13GB disk space and recommended 16GB+ of VRAM on the GPU. For conversion, the original model will be pulled and quantized, which requires 50GB of free RAM. - +> **Note:** For deployment, the model requires ~24GB disk space and recommended 28GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/Qwen3-4B-int4-ov -:sync: OpenVINO/Qwen3-4B-int4-ov -```bat -ovms.exe --pull --source_model OpenVINO/Qwen3-4B-int4-ov --model_repository_path models --model_name OpenVINO/Qwen3-4B-int4-ov --target_device GPU --task text_generation --tool_parser hermes3 - -ovms.exe --add_to_config --config_path models/config_all.json --model_name OpenVINO/Qwen3-4B-int4-ov --model_path OpenVINO/Qwen3-4B-int4-ov -``` -> **Note:** `Qwen3` models are available on [HuggingFace OpenVINO repository](https://huggingface.co/OpenVINO/models?search=qwen3) in different sizes and precisions. It is possible to choose it for any use and hardware. +:::{tab-item} OpenVINO/Qwen3-8B-int4-ov +:sync: OpenVINO/Qwen3-8B-int4-ov +```bash +mkdir c:\models +docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ + openvino/model_server:latest \ + --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --model_name Qwen3-8B +``` +> **Note:** For deployment, the model requires ~4GB disk space and recommended 6GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov -:sync: OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov -```bat -ovms.exe --pull --source_model OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov --model_repository_path models --model_name OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov --target_device GPU --task text_generation - -ovms.exe --add_to_config --config_path models/config_all.json --model_name OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov --model_path OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov +:::{tab-item} OpenVINO/Qwen3-8B-int4-cw-ov +:sync: OpenVINO/Qwen3-8B-int4-cw-ov +```bash +mkdir c:\models +docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/accel --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ + openvino/model_server:latest-gpu \ + --model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-cw-ov --task text_generation --target_device NPU --tool_parser hermes3 --rest_port 8000 --max_prompt_len 16384 --plugin_config '{"NPUW_LLM_PREFILL_ATTENTION_HINT":"PYRAMID"}' --model_name Qwen3-8B ``` - -> **Note:** `Qwen2.5-Coder` models are available on [HuggingFace OpenVINO repository](https://huggingface.co/OpenVINO/models?search=qwen2.5-coder) in different sizes and precisions. It is possible to choose it for any use and hardware. - +> **Note:** First model initialization might be long. With the compilation cache, sequential model loading will be fast. ::: :::: -## Set Up Server -Run OpenVINO Model Server with all downloaded models loaded at the same time: -::::{tab-set} -:::{tab-item} Windows -:sync: Windows -### Windows: deploying on bare metal -Please refer to OpenVINO Model Server installation first: [link](../../docs/deploying_server_baremetal.md) +## Custom models + +Models which are not published in OpenVINO format can be exported and quantized with custom parameters. Below is an example how to export and deploy model Devstral-Small-2507. -```bat -set MOE_USE_MICRO_GEMM_PREFILL=0 -ovms --rest_port 8000 --config_path ./models/config_all.json -``` -::: -:::{tab-item} Linux CPU -:sync: Linux CPU -### Linux: via Docker with CPU -```text -docker run -d --rm -u $(id -u):$(id -g) -e MOE_USE_MICRO_GEMM_PREFILL=0 \ - -p 8000:8000 -v $(pwd)/:/workspace/ openvino/model_server:weekly --rest_port 8000 --config_path /workspace/models/config_all.json -``` -::: -:::{tab-item} Linux GPU -:sync: Linux GPU -### Linux: via Docker with GPU ```bash -docker run -d --rm --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) -e MOE_USE_MICRO_GEMM_PREFILL=0 \ - -p 8000:8000 -v $(pwd)/:/workspace/ openvino/model_server:weekly --rest_port 8000 --config_path /workspace/models/config_all.json +mkdir models +python export_model.py text_generation --source_model unsloth/Devstral-Small-2507 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --tool_parser devstral --target_device GPU +curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_devstral.jinja + +ovms --model_repository_path models --source_model unsloth/Devstral-Small-2507 --task text_generation --target_device GPU --tool_parser devstral --rest_port 8000 --cache_dir .ovcache ``` -::: -:::: +> **Note:** Exporting models is a one time operation but might consume RAM at least of the model size and might take a lot of time depending on the model size. + -> **Note:** `MOE_USE_MICRO_GEMM_PREFILL=0` is a workaround for *Qwen3-Coder-30B-A3B-Instruct* and it will be fixed in release 2026.1 or next weekly. ## Set Up Visual Studio Code @@ -261,16 +171,16 @@ Open configuration file: Prepare a config: ::::{tab-set} -:::{tab-item} Qwen/Qwen3-Coder-30B-A3B-Instruct -:sync: Qwen/Qwen3-Coder-30B-A3B-Instruct +:::{tab-item} Qwen3-Coder-30B-A3B-Instruct +:sync: Qwen3-Coder-30B-A3B-Instruct ``` name: Local Assistant version: 1.0.0 schema: v1 models: - - name: OVMS Qwen/Qwen3-Coder-30B-A3B + - name: OVMS Qwen3-Coder-30B-A3B-Instruct provider: openai - model: Qwen/Qwen3-Coder-30B-A3B-Instruct + model: Qwen3-Coder-30B-A3B-Instruct apiKey: unused apiBase: http://localhost:8000/v3 roles: @@ -296,51 +206,17 @@ context: - provider: codebase ``` ::: -:::{tab-item} mistralai/Codestral-22B-v0.1 -:sync: mistralai/Codestral-22B-v0.1 -``` -name: Local Assistant -version: 1.0.0 -schema: v1 -models: - - name: OVMS mistralai/Codestral-22B-v0.1 - provider: openai - model: mistralai/Codestral-22B-v0.1 - apiKey: unused - apiBase: http://localhost:8000/v3 - roles: - - chat - - edit - - apply - - autocomplete - capabilities: - - tool_use - autocompleteOptions: - maxPromptTokens: 500 - debounceDelay: 124 - useCache: true - onlyMyCode: true - modelTimeout: 400 -context: - - provider: code - - provider: docs - - provider: diff - - provider: terminal - - provider: problems - - provider: folder - - provider: codebase -``` -::: -:::{tab-item} openai/gpt-oss-20b -:sync: openai/gpt-oss-20b + +:::{tab-item} gpt-oss-20b +:sync: gpt-oss-20b ``` name: Local Assistant version: 1.0.0 schema: v1 models: - - name: OVMS openai/gpt-oss-20b + - name: OVMS gpt-oss-20b provider: openai - model: openai/gpt-oss-20b + model: gpt-oss-20b apiKey: unused apiBase: http://localhost:8000/v3 roles: @@ -349,9 +225,9 @@ models: - apply capabilities: - tool_use - - name: OVMS openai/gpt-oss-20b autocomplete + - name: OVMS gpt-oss-20b autocomplete provider: openai - model: openai/gpt-oss-20b + model: gpt-oss-20b apiKey: unused apiBase: http://localhost:8000/v3 roles: @@ -413,35 +289,28 @@ context: - provider: codebase ``` ::: -:::{tab-item} OpenVINO/Qwen3-4B-int4-ov -:sync: OpenVINO/Qwen3-4B-int4-ov +:::{tab-item} Qwen3-8B +:sync: Qwen3-8B ``` name: Local Assistant version: 1.0.0 schema: v1 models: - - name: OVMS OpenVINO/Qwen3-4B + - name: OVMS Qwen3-8B provider: openai - model: OpenVINO/Qwen3-4B-int4-ov + model: Qwen3-8B apiKey: unused apiBase: http://localhost:8000/v3 roles: - chat - edit - apply - - autocomplete capabilities: - tool_use requestOptions: extraBodyProperties: chat_template_kwargs: enable_thinking: false - autocompleteOptions: - maxPromptTokens: 500 - debounceDelay: 124 - useCache: true - onlyMyCode: true - modelTimeout: 400 context: - provider: code - provider: docs @@ -451,42 +320,6 @@ context: - provider: folder - provider: codebase ``` -::: -:::{tab-item} OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov -:sync: OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov -``` -name: Local Assistant -version: 1.0.0 -schema: v1 -models: - - name: OVMS OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov - provider: openai - model: OpenVINO/Qwen2.5-Coder-3B-Instruct-int4-ov - apiKey: unused - apiBase: http://localhost:8000/v3 - roles: - - chat - - edit - - apply - - autocomplete - capabilities: - - tool_use - autocompleteOptions: - maxPromptTokens: 500 - debounceDelay: 124 - useCache: true - onlyMyCode: true - modelTimeout: 400 -context: - - provider: code - - provider: docs - - provider: diff - - provider: terminal - - provider: problems - - provider: folder - - provider: codebase -``` -::: :::: > **Note:** For more information about this config, see [configuration reference](https://docs.continue.dev/reference#models). @@ -526,3 +359,6 @@ Example use cases for tools: ![glob](./glob.png) +* Extending VRAM allocation to iGPU + +![xram](./vram.png) diff --git a/demos/code_local_assistant/vram.png b/demos/code_local_assistant/vram.png new file mode 100644 index 0000000000000000000000000000000000000000..b146fd75e1529433f17f5f4829e1906f6a07fba0 GIT binary patch literal 25648 zcmd43cT`hbyFQ92qCvqzaf66%8%4wlN(+J=F<=NCB3NkBdr1&AUz7S`OfEg-goAn?Zq?e zLFynG8JYFx&Yrv^BeO~@BeTMF&1&GuiLH<2f&W&7T{>eaQ{KLH61Z9AeZuC1j7(*! zg4k0IxL+H5);UZ@MzLk---_0tqN_48S3aCOdBQ&0gE=h!(!N*naaPXyLS=5{$rA@Z zmiOmX?uqPqpK=T2w87-^iprZHuN5axovi5CC^rq_l~>Y(Y3kGZV<_=>niBOCMc}wnJZ|&fi$I zeA{8%dq-pWQl{Et6>|B~LT(3p>s~fg$$Sx z#9b{8AnxJi5BSWt&(v-vhl#YJy}aNDH}9p5acZL{A_D2Oh_Jm`p`N)mKA&&P*Wnz^ zrl>>QdzcuG2IV@{B=GZ(Gj&>JZ{2ePpw4wIf#pmtnSirmSx?XFhw$DP4PPR1&rkd7 zina#V6XOMfi3ZJsQ`e*HQ>%tb{AV8fIO-w_l)!VPBHChOs`-y#@bBhXK>P7uWumwo z`tZwF25RAF$q$Cf4%9;-H^Pj=KH1~nImi>W3`j9v=+cJH#+hmBLgKH4#sDw)j%#bh zep&9y&VH@i?jd}YS53|eHYZ$|*88X=?i=(m;1e5&d#NwAuM(#^hiZCF&A}k?bj)Cg z!RdhJAL-3qq#-@yLd(uzG{h54)p@gy|9lYQIQ{c_dzMB68r#TwL5$_RnOHcZg&3D| zS=eD6Z@1US3Y1xm{G1G9Ng)0s)#Mk>xk&@JWI{4B+G!h~W`$y~RT-|=32mE1Clg;$ zODm?Zn?>nQWg8Su5C1Y85&g@a`u@W7IrUU=mbo&$U6? zR&_3+rQLlFH%QmSEl<04+OcnX=6PZ7-|TgD8{O;s&V!1(^n5oP>a*)B_HE7J4a~}1 zT0+gNN9X)7&y=be`+lMHZ*%b^->;p37+7rHMfnR1@hUwPBh6!f`e|J{{1 z`Ef!%rL^qGSLd9kuHTXRJ@c4#)1k-K}Cw$d3dcPfseP9L=-Lg0U4RM-RUdAvnv}@GNvh`6_78R1OrwlQcPh3PZ-&;>3sNn2gyC-57u~%|KTyn%o!#w1sgZ}%p*g+A-r|ieK5o0 z+-tb!wQwfp2Az5Vxw8^JSW32B?H+Ff^vh{IKYw5n@18eVCyX~2P0vsCOIIe-c^H?& z@e^Uv(U#PfwoZB(BHefz{iDG3`e~7hfNH3)QDvycHNYflxWBkF%*L$m#{#(Vh1!8B zV~$y8!>SsHQUE3DEKLJZ$T8+X1J+7^56q{+?8&-8`1zs5b|X%b{O}y+#?xcC<2Ae{ z$Uct7p~%47E#|&&o5+^0G(OVO6}*)<6DNZ z-!VJ=C!XU_1GPTt?f?tk0VDsCC47J-TCaP(%~4R+@wuj9{2cx&Wrv+W%|?Ek)&jje=gkw@r{0PYNJ@ zB&6FYgfPC@x86G|%$ajsv?Cj#B#!h=L~EB; zeK=v?Tln}QSVioN7N#f@(b>8xlJh5OC}9bm8&Das?sXoEPN>A#XNQh%`Q5kb8)ar} ze-&sf?A|ZDUGH;Lgy(4C{kL`7K(f~B#i6ih_vD{Aopn&#>&kzf3P{=JV3N;IjERv< zY=Mz$2z)IFZ+q7e?eeK_hb zOlPM~4qnpH?)GZ{eY#PHDf_O0t2t%7?@d|IXyFt@l%;LIHFluiEgyXgtZ+Zz3qDJ4 zd(T1P%v~D|ym@yC59nR8t{n4NUhBp)TpPFL&CVVfKaY0DU*T>igw#$?2Cclht+`qTZhZd~ZSj zxVsfCgM4*N@;LMiH79VnJMe;8{CR)8_%+4 zLPe&Wufq;US`({ioA)Wcpz1OCmK7=?Jq06G7C7e@-8u_fQ3B3{((NL0Y4}E{FiSwM zynG4v0N3;|RU}xkG!Gp@9kweXb9m^`*MJxOd%rHm4}QqG{1qg=FBkWLr_#MI5iMs` z7xfyY@35jMj_(U9!1Pcd%=h&FrEma*|b^C5`t=+(_GDW)KQ?_5J$3vTgF) z4(RY|S53DA9ngs=UVe98)1e{9vsJ5*AZO*KeybBtN-Cz!6NJV04dVKnH=u+U^B$Ao z8y%kqNjVhlBULxcEw-!aH&z}y;^x)#ELx6shI9_FTS+XvI&NNNisUq^mYEQepQ|d< zdDMv}EZ)%7e8xR(MyE*TFleooWDjU%3;%sl?!|W{mS+0VIebTSqUuGvyM;_?T&avh z;%iH17}vLQ9g|-;cs^%ox8P`YJaZ%EUWA6Kh;{AP4Sof7=jVgvI)q%2VhPBOChuB9 zp#s*IODPhq*LCaRL$D{XF#eu(gR;vbaUFmJ_P=(!n88l`g;c^62R1OJ^~{nHX#bl4 zTxiLyKWzr>Bhs3;F&!(7&GJa4Zx1YMcBg8+a#T@+N!%CR0(Ydm z1~z5gDQZewXU!|##d`H}hp7hXas2y%&3T3dod)^Wf4~%{2-Gurto`O^a-=PdoWmu` z995oENf=st8WS@?vpZXCXpJS+UJ9(y9NJXq#-d&jK$Yi;OKfw{B&Tx+V`&j!Qnc z4uGPvEsgbgIbY0e*5blkG?;uVOSG=K%9g2`M(Ia*F6D>8%S1+Ue4&g?@3YcI1 zQ9{%*rhk4HDBzM_l)&A~;&+iUWm<_^w)xHPmhbFW%Yw7dqi!#cR?)guNQ>tn`DJEc z`^Qr8DvC=yNJb|65YR`M?$&j`?<}ruXRj==Kep`fZ+@`=8eSZH;P>BTo+mCs+}4^VNPD)o8xN#G~k&T8`^Jlb=nQj-9NnZ%Js$z6Y|v- z|I}mUyb=tat_lE&)<|x?*!xdQwOvQkRC#|`g0p|E{ikbyS*R$8E#SEsyW z#_}4BD>aL*|GT`rm846}=H1V7mWB*GcDch1*PoM-agectKO)jPzBoI(7fozM#||)~ ze9-R?>VHOrWu=t!r37*=6Bou6XGY^+_L%2 zUxVs?Tx{GGMABWH$q-%GFNSXD-_&w&*@`W!p>~PzgR1W{BPKc8)q}y9+qrc2IQT|X z0&mFDTv8rvyCaLoi1Nq&;EDp}U#l{^A!d$Rpv9g&_2$|6K!^4X8_{2BPhWazbh%^V zJraG3lxi;Z?rGfZAUt}6%}5xVn5eLs}K=;uUIkJ552~ zxvHarL1Oz?g^{Wwue#+S-%#;=sm$&W97K6)fU9T9;opOaQ55;=BV(5zs|-I43JGz| z^hDcD6y9&rPw3XK#1eqYej9Lh@sV?*^dO9X0zsM4UBJ=(R5fLa<_++wn+#qT_|v-% zs@_Oy4wKQhIkx5Mhk`jj^`I*ygUYJ9Mm=6yjw$I|Z5C>h<$^k^M~{<#GP_y6W3RP_ z0v0mza_RGpV%?}Aji3g#%)RyF>{@Um=X9X@TjW&4wuJN2@M}7T`TV2|Z@Ru$F2fcq z(&{tnbimG5?uCCLI>bGy%c?M2S{vDOFxV)i14Ob+gu9@$I$ZADKH<>#lF!n9b}3qQ z2BV}P7R#9{hwErV?-odC2sIXTO92ykS9N4_$1M{Z4cyI+l=+n{$BHg&D`ZCMNA3p; zOr+0t#8A+{W?3I6X^RS@zolI9^yp8eCh&)E`JfloCG(1;*zN)!sQ8h-SS95Pn>CkQi8Uh5aC9Q}y+E2X}~bTR9u z)Lda^pSDR0qLSk)KdbTk{c@Xp=ok=izTa$w|Z zlKxfLBP3}|lzqyyxzQ@U4&#qQPB;A7vt%H; z51RMWVnz^i93lzgNxLX|{cJ+s0tc)lJ~|P*oRI#OTwL~|5EPq)A!b~30^YB+VY4W)4p`hP zJB*h)HJug4ZAgaKst39!!l}13eNzvxtzA3v`Xla;TX?>$KCW8?&NU;1+gR&;>n`2Z zRb)yyS|yuG-LLMFq%X~6^@TeKZTV$ie!@Y$?mQOOmuaHii1(;$BE!E~CZcCT3M<;Y zQ~S0GJ|mNPA^7%qZIGbZ@2r_q_0nMYxChq%a5iGh;)kdAj+{Fh(T)l1D0(&1bY=C% zHM!gWRkCu=fE6iO)a_JDH;A{}$F91mA8=5?Zu^$if%zBORKPFiQ~hs@dUuC04(H>y z=u-BumcVEoxRIT*Ls8=F9Ks#-lwu=m1J`iU^v7bOjOLH~H zLLRLb16F#e+9v0E&MA}XidIHYV;Zs1^1)wQemlABh!Q={;WOxXlt#?gp;&|I9Nl*n zq3m4$>%{eVv)pr1WPkCz&P?f!%|q&ph>O#%QS-OmNe_Y|pU%YTl-#JFY^M)9r(Fov z2!v8=*8LHd<{BoUQfSA7|9EUnj56{ukPh)c_3ern_Ds8}EM;Mz;rcJFt2a8#P=II4?!ClT#r zTRfkflP7y3lcuz**8>Z_!9X8o{&pZ8b2=Q2!;*XT2lQ+fzZUew#bRYix8r)Y<#jNJ zz7`Sy1U4x7?)cfZyfPeYi3%j0jnS7FWK8X^pNhnHey<35kVzzlVPB35-NVEDEViW2 zHDSsnN%B9l(1+Nulqj~%ZcW$Ef1`l5ce;}4oCDS?UZCQ6XBE1E)jDP-4zDlZQ)Rb- z-c17?s8h?zgfn|dB=(tOHple8Ddg9jQxJy_BR8mzJcUq>BAYd56IMk{jC&-^54*X# z=Xs=+)2M0~)g|;|`-k$nk$B2{VB7lh*vkql!2@_}q&9{BJtGtrh2)}3xF5eoA98#Y z_48aBp|@{%FLw(nx-iSAZ4F==wVa#V|hEEL3pTa_(-`+fPk3Z=aw>AEHgJoao650YFQThYj1@VFQZJ@ep zvr_Tl53*pp?FFLm3u8IW2Z%wJx704=3`vuD_f*wp)ky4AZayj|r`m9A0a0Sfp((j1 zH`iC6H_sSnz%i)TcMwbKg%JS=nV~0K4%QXY9z~f_ikcv>2vB8vgO{KEA5mYLDrerT zp!~8tW=6XFGa&F0JR;Jnb?8^hgxS72L)eUFm@pVGG~nw*Kh$0Q04C7v39HBmCMyK~ zyh8|V`UTAmW<|;K+T1gAc^8eN9=akz&KA)_NFneDm|y`K%OzSi{~*`xskNszw7kR3|-UgT~!zC6BVCPG0$! zjADYR~ z5TN+;(8m;H9_e*cw_zlx9sN3XXqj5akF#_6O7tKOvA%$^vG{-Gb5r zv}<=Pgml@>-ZzTbb0&{%DAESlhz@(ca5(8LNX&7A z-Sl=lHsT)gLt=N-3p&KAaQp8K=~}oOHr%9FNFfOB$QE4@x&3kwU7PH-ZqaEdRNSh8 za#B?IS=0W2a7#B&vL(fUC z#=cU~|K{1MSR3ybKx!%crMjVW7PKlT?q}Q35n8m}lfAi(U8+oNWx-?=3iWj6k(&JR zAOXim9Saj=h6XatRhcp!UVl@W>}#F3wmPPeXqjP?yYOLjXx`MJ0BFtNJ;@m&W{>HF z;jp>K**{HH9~Dl1iQDE9Z5(2K`adX4^#zB|g`ndwy8GdUCq;8XnKp`lPV4Q2X=Fu8 z*LRx91if6S)`)B|(~qSF9{f+vRPsto5OI9AD|T3|G9Mt4|N9i_@5xwNQ~zt6Ea|j7 zQm?cwlhNh>QsdyHj?wk;ucwM1QIildpFIBIcpVc>EdH5HmT>*6?+$Qp{QE}RRvX>} zDsSSjZl%%^K>FP%8T+N;tUk8u>r5XmS!16Muhsl-e;lhGeyl6BAE0aH1$crVsbo2> zPHiMd%pBU@iL)P^>1$c9u^8M#0a(V1!E#!S{tcz-J^!X<&Op?|)rWjrbP9&Z)oB`& zhOWV0X(>|-1=z}yHH@h|ZyIKJ^7h{GEQ~?GPPWzVHUF@_bW`?VY}(AdFD3yBgGU{5 z7{?f0ljV+!mG_wWwzYeCZYR*UPRfGqZ~a4Mb3U&G!}zE~IMjYYrA`qxp~HrrE!1fQ z$&_7!@VE6eK7nSX;MQ+FXp;UB^P`$Zx~_?p2>)1(H2x@r!Ho-tDYl|;H>!E|@qd$S zn{9%C`ZmvdO$Hbc>2~b8c2tCFmvcv<1T?AqvDgnnJ}T#w^V&>FPuwL7=H_1sFEw*4 zDs1Gs?Z_jMn_Jl~iKYLguzR|~o{M}o^aH$>w6)pRu6i`d63m2PzI4x5P}_B7xp6v) z=w?$?EK+N#Tj;Q*|0T-mX2DxPH;`SLv74Z0=SI`yNPiiW-P*rSeB*v#|HT#P0E69Z zg+eJ%a$=s%AAJy*bfDVvzw|;~Ffdi`xna+*a*kf4=BCw18|N{{y9FRU_PnE1F5TK` zM&Y+X=vw8i1k1&G`dm+Xn3=Ig81E_iScD-RU~dMrzR_vn*AOKzqgB`oA$@<`q@;i9 zKMf9h25nbPd#rPhd8w!lQf+X9w#D{jx5-{8h;B*e*Tcu%(sFBup+FS9RXN z<*$EFT_+%4a}-vZpjtBknaGuL@fQ-zSogne9 zA-fVZr?yi+qH{j*LmLopMmY$$YwZTF2}d&f|Bgs!&#G$*e9s5#0JQ&9wyva^`FH2k zv7L!<_tp|w+ZAZ!i`WZR0hLxpzh#dYIPw?gB{;#86Hl@d?V`T;8plfbIfE8m@G@C` zxnnEf8=(D##q~&HU5fABC7}c)7>Q<>EvXqpI6&q(PA#XkT=|ky-ULx1{;d{f7axro zE&O6$GpoiZV!TWe2%eJj{2GIa-(GTUJ8jDkPJzVAyyCe7LyF7w96yOR6FbQRrJ>vP7RA^gNO8_zCR1rF|oJ=PE1&f4(W`~GuGg4mn?=6|UvG$c(TdHi1SJ>Yn5XogXrmk*vx+l@1-DT_22 z;c6HUyGM5C*`3{^=7zvCLc7=}2m5XB1Yzya?7|Dzp1n{U;r@3wLpPYusX(KST62mlH?q_X6Zvzxf>9LrV^!1~(Z5 zh*T6uG%Gp1oBDikzlg7+J~bfibDBJ}oKw;Zh}sraCf7#$wMU7;KLa}Fdnt_Rq!fZ4 zHL`htvWuGb)aHlemum8E-m7Xv38-EmJZ_hLJu!zib7#emT%C7*-raY&hxL!yyOJJd zVW&~y^wc^%YQdD1f8hoQ?5+g+^0|?m^NM`U|5BkbSFDkHnRRR9;W-?(FC@t5Do3-k zaWk2)Kg5mj@b%nQi;SA!wT8M!Aa6VRswa-SnZG(dS-u)@b`EXWY8c3eZAeUMwR11oWL77fHcx$`hI3 z_*gHrI6l*C_!Q;aed_D*nV0XSp|vx-wD0$ggEHj8b||zfZ_kX#{!Dv}9D%q2Gr1R+35G2RsKPI5(>l;ez>!x_x z9RK^H=T|s8jj7G8JO*sP8z!m_K){mw;49r>Wj4|E$n6W6L^l zksn~%=^suHYEE9<7LD+V+d=m*?`6_zB3%Lq6WgLeZTC@LlBAxCl$s_o$d@sRJsuTFa!!2HGUBV`8u-FI~rL9vU$kVDDG ztqR$*{{W)@hCoy{1H^mnv#R}>Gm$2v)h7o(P7Uz=jDp`!zmLp5`tQDgVWWiWIIs@j+svL+WJ#TEfF!%qhmX9Zh@L|S+!qAiV3K@QBqfI_9Lm;TtuB?7^)B4H5y=w>tUI(rkjJ5!E>67z6H z#!8>Sf4|ZhJ;rWV6jSWiK3dNc2Ig{^^O@$Vny6Y3iRsRWE*`FFsGt75X zi3Tw_Zel~>E7fns4NL_+ajDjo?j#tq(9jr-R4Nq6w(b6BFnsJsm%?0>kf)MOgN)oO z%YmbVAV6`;)kJQgLw{w}U?3>8ma>pJy|qp$TUQjW30mD^o;begpNfy@vInCxS~fio z$hYmscbo6sE%aZpgl~fnyzau_ zVgsQ_ZZ{a4g+a5^6UUYRX_ftjQOAqOY)3_q;?WwTA;lww|HkV#02DKs1dH4;Ja1rH zzFXIOwAD+~r}!W1OL{AZeE#M&pk=E#l{elD{a*`IKCX#mMn&nB*AQK$YNX}hkzZ+# zxisUL*o5#e{lnQM#0G_ktA&>)6fr7%jqtqdt-qa|tz1f^Kk(4CxWyw7bu?o?2XH`3 z%>Qb>yU1+lye2PP)6h)!+URo0>RXxAh<@T{-QVF`AinBsca5e_GTwEqV)+ZPr35=Q ze3uL02)`_aVRA((6N{hM$Q%Ezt>j#?O=^f<=_@gju=Q^r<$P}?>8HhSd~8tz$PTs+ zOVP1qhc4r_R1Tn0zWqC=)8Zkp|2i*)9scG5cY*y%8xwrs6KHuqky#1}UR!F|-=XmB zfHsm{x)cBbidy;rC`0BM4Co+g8AIIM0yz4&Ao<&+e+R-h03-1Kc_W-KYs7{l;|D1D zlmk&CcLxIBzbX6^5F#Gok^@#8!Tv`FD^)cGWffDAqt9aFWKU$fk&qCJeDuzJz(JGI(wd7Dtmu<)72;UB(~#ka1*=I;ALNkebF zbC)#v66*)+#mW;p6VU9!%VeD1^kzvTu=WTC55rS*CCWmb4eLa{`?y-_rb2TH;p_ zb&v4igLoTHcs~#pA`TVPA)|3Xs-$iHQnj`aPasiCw_kzpy1l5wCLmx6o5k?%Z#>xC zM1CDH8BW10_P^&{(q{YnIIXD;U$)JwzaGTEiT;7L54Ov_f?k6!e!X7%2&tKj011!e zM}1j|ts8*U|7dUc&Z}qTTaqNH>R+N%0L=?j^9c~c<3K|D&uI=dg}#k@*Kbv0Ruz*R zxM_WjOHOjhn=6QqG&pw0_eS;$lHJLTj&x0IWO_3u8zP>3=aEJF4nEibPZ1lh+;gUB;y9T1LoX(9FA7r3C8u;OiKDhm%E=vWJ)WGfNca=TBPQ{ zSBEGF;ee^PlEmC|>Na7+XtVWRq83t+bLx&PD}h}c$L>s1@c4>KyBHeOD8Uc;FU-Md zCz(-0M7W1WLl!%SVkUU2LujO%&;2E0B%@VKDY>V>KWoygh)N-}Hu!{bApq6#_;B4+PQ!3x2o=_JHb3 zy&zDe%41(DRujAZ!XPu6)C^hiirTLXHyLR@DR+a)q8;v7N+RfJYc|Ft^$;Qy88k47 z=j^JWBsqUdE?QdYOSukEvEqgV4|i)9*q)LzzrBk=re)!=iS*Y*G^L2adWdrl?r{OK zNz-U;i>VpMnD5*yBIjOFPPI;OdsNcf>O?w)TiHETttVNhtN`h3cFvzQj&mL@Y%0Fe z0dW~sYc!O*i;fqbijUumP@SX%b_q9_w99q`FKPrKwlFbwnphtkrGKf7NUviXACNQ@ zhTr=t?KV#?i2QUgtrpr|w|_QE^kt(!GLx;4E1F-R+sV;tpy*GC0w6~44N&%P$I!G; zw?q*dmpDRT#9ZF?DsKD!LMt5W>eqLsoM{Corh7L=9VPGA&A_F`GhWm)4&$dgLTlMC7yfCbx=t`#r!NPJaLB1yTvQ6tG zSJ$mdrk%8h1gV)7Rs4`#iLXN{=(R)|7iO9;smii`XjGzSEu~-1rbD>oSKh7@zc7(( z!~dKh5syYqXW$W!hJi%=AEaO~IQr!GmuNTkL)YsQm_TOR+~R~sF?|&!Qe6N3;8fh_ z&V&c;Dm<7Ls735&j=r@?a$`M^KIo#_hiaW{jE#^(&dnCWe+q-nj`C;s!cQjVX$f?$ zUg{5TZGg~g^mntL+EHjvh$CJUk5VO*i_k|A`pD;KW_QF#gpp~uUtriaR zzKg%;=`guGen|fyomM`cC09PqLzQ2jl65@ni;S^x9rX)zo$#CS7^RvE%0!=Q3~G-? zF&TY&d;je=!jHs0(YdqYb<>&KN%X1kj{=B#FceO#M8Ba0FY2pR?`W= zuobXnwxdAK_5}5Pip^#Zg%WGPR(F4G3Yr{aom})>V`YqrqDr4CWVH>BATu_*x4mcO zBAEEH%#^)g6Z)8s2-M*HGXP#TzrpeFBvISx#3aQQ{Fd&s2h@K6y&MkD2G#atDqY4gG=o(>L*ja8{3T zUq?0l=pHuIq9g$M@_N8^+I+Y@BHAf6hk$6?D40A}(-{Zo?Xaw+l-yHtKCwl9`xWi| zQ@gzjSgD94(0Y|pD(5O9kU9=erFn}9eIu-n^6tg*p%B*LPLA7RUC(~S{qwgQf*M_I ztM9WO;mw`I)VPx*hET%41Wn|c^&t^i5Yzcs4qeBe`M)>}k&M?$&Rp9_qZU^WwN1X4txXIny zC7b&p4PF=sd8(RToFHe1)CRJBA>F52iiyY!zPaFtqCk}eXz<<7uG&o2@OLw*2w%kA zgx3;ebZSngx2H-1@3=Gju=i|uP=gj@pW?P&ht^2~Dpqx3k~_$M?@=Y}i1Rz~DvKF^J?kdU3w_w9O#b%KG@vmjd%Up1t`Hm7BA1BRl!A;*vTt z9ibwhWOqP#E?kkw+ncpOnyBk@7fZF-(96Y57wHdgw?lGS6M6};M$|~B)E%Al>Eswz zh!U3IL);|GO_yXzb>;f?$!MxS zc*o@k1pv=o-6+th?N)ZRdDok;O8KL2dbx9~0G-FAvVzn^O375gVDA|hxD*hb@ltLaq)#_j9^^LHZhkf!?i9({#Hb(Zfs!6^eP+pVo{7fejrRTW_+SDyu z1}LPeITgv7WXpVZ&W4ZZO>RIcHvpl&$-VRqP#%gW9(&(259Fc6-{BAGof!p=5-d!J z{jmIcoXu9T_Em+J!*T(A=B3KYHol6OaaDbnXN)t5B)u(q+CB7=d7%n&EQaaUDcZ8? z<|V*t$SUjs?(6t*A3ifGM8=1dta1mPFbgYB020b2^Dh zfALbp+M^^=M2f+IALVI8rAEm+iG-*%GiL12_BrRUd*IdKB2cz)uj z5W^{P7r59VmMDiaz}45J)qHdiqLgckFP+2t}c(qEzZl{h0dZa!V!DIaD*w zlF!EYpT}w)NE&n8fy+Ke<#hN6%cIB^8SRYZ+)b-(4b2=|+Yq}H9f6z()HJo*fvO@G z7_}-VFX46O*DMi+mz0RURaf0=l?pI?Rp~%Xwf;H|pi?cH#RSwO!na&FwZg4h7?P^J z-M;rq|LLj2`HtkH*Ni+i&)Z)mA5R zU^Emkbmzjt_ z4XZPfGHC_4W-`eeUStcPtdS&pK%S4$N|=jL9@m*xU@7?A*Qa{+MgUn_A+XWhB82_X z=!@c5i65y#tNU>iwg*HqqDfsl!wQ|F9%OYQgk2*hz!YKT@6XVHRFJSvG8@P=zWL&M zZ+Xk`rYPjd)#r$>7cyCag0K7;uw1iw+&3{h4lfK)aLf46TGPPNz<&v$5ENaj%X{P2 z{>@c)KWHM)wBSF^(0B8Ws@VLhneQ`VSBZXpfI9o%2BQ$FXFsfPK2%|?AikmEhEd$7 zDp=sNtKvA5$1q*poDuf6gby=gdvS2+6Q+wYwO9zj7CwW;TJ;s8>?1REo91ro)hHCs zp9j=TbFpa8QrU68CbtFg(x;K(o#E7HKdfWgJo~Q$Wp+S3pEny43*i7rB^IL1D95{J z#?n+>oUg&%p1)kjd3fuDJ3iBjg-_{?=_*+aPcW_Fv&i0qT*Si}=h~R4D46mT@=25u zNw^vrEA4DW8#EeS7OUR}#GKx=*m6VTkEF&Uahitx?&iqA?_rIxb61L?u_C36G2Uco zNG9z#u%-_OjOTTdyg5OJW+Mw<&jfyA`vx|YrAy5t!J6Dse-j2{55S=w@rjG5Sr1_z z>%n3K*u}H~-!?55_=*9K(8L|z5z?j)q_-^-|08Ec3qhtZ(I5KNIZV)`m?^!rwWNO@ zkF69n`4CgMcuWujlLJI#FLZ>v$wl%iJ{;Y*>a4=FIACh0TL7hm@_obJnEIl+4u@mC zR*6Y+PPwpHm5PatWjtfF;xPbnh7Bp&Fq`^35{8e!RM(P1ZQhK=Qb1M`cMSOpDd@7*N-!5w{6Pw;H~tuu7Yt+Nn|UmB$DOje zNYkMY|K-v~ho1UPS&?ZMY!DD&K)WVr3qt-v$5?mX%^;DPs5WHtOcWepMzd*PZp2Fs zTX(1E6y{$?O#5Bmb-ON6mnK+LXD8$QxgH5b)!B-Qz|a1RFYx4BW~g{+> z-3sF755|w*oG0n(wFmK!lP%4MTOs`EB~bytXczL!D|#OgrAQ~vBpKQ$tg=&`A+DQ-S9Asj!% zwLF62H*HHJ!UKA%7g<0cw#al*{2h}Ei#KSPQ{#W}XVpEzne0%Lx!Az8OlB6QOT&6CNMcZ#`2mG zl^H+;xD#X!lbL$4OJlLlaX(u$tp%^kyIB8cT|-LX*Id^(J_*q`iqm7yvjm$|rz=_; zE==nGK0*PI{^#Rs#L8VdFAvv&(KrSY7(kl^S)?HN0OQODudWxv`FX{&AptW^qb~?! zkbTrjKulXYp|M>L0PNR*vlKx4Euj9No_TA!1_cQA{sRo1BYSA zDk-nQ)qh-;avdm&CHhzvIuudfqPiYkJ_Lw_%VHfsQ6R7w(D%35uz0{J0-q!KUJoK) z_tVw!m)|9Xs@(J)1J3y-IgNibavK~Z-FVv()H|qJ8wYmzDRmt6K$4|lPh9yaa^=kL z^Y{h{EpW1!MdS?{#+QUHotpVA_yL;`Y4;gAEy2LrIz_q^p*r#uy8q>9yT-u>r>+49 zWSGNza3mUONM5gLZU*AvXjQ2iMq@~RFJ*qdWU;Hcp*moF<#>>YuzOrO*!#_`|7s3; z>*7$t@H7d*GIwX+7R8S#1mZV(tjSubOODDGnYfS9K` zjP(A|70JwR!Z0_OueaJ>sjrSjB9-m-V+!^YP(19CHT+u7zCkJXFDSGWxfYF);a;a z@vS{iX9F=_>}Djo7xm|r>!!>UVRn$F86T844BBMQyBsMPj;AZkaNa$g z8*$C_24ai!k7k7LALYZaW2HN~bIC`T4FpN2(Qw^}S#0EVJdPN`oA0#w%x#=7GfVh6 z{l0Q%U^yZ{XvP&kuxb&$){?pD2V@b&xl+(GLe z+BI%ou<&W^!j9bLgRt{1aASJAh@UE~oS!1pthYML4I;yw=_EB(w~1p7ig8L3#A)=s z%qrgDdo+KN3xwXmog$134y3liPS3{0oiA6~EU)5=wqn*i7Yf_nwR&un+sccbuKMFx z{EG~3bi5>%%uMGtVj5~{5FqC+(W^fiu%6(!sL!lI$HiF!qRykiJ9e=b&6z?=8wVKd z+jdO+mqVHwjt_equV)S}HVUyP=zVnHPzPl+VX+L3I)_8N*N<#Sc-UUp)DaZ-If6fq zIJ4TrS(G&q!H5FegY@`G!)yxz)j)md369E6I<`<)Wo{ASQA4I0J-vXvkGwx__F+sU=RGwlEo?5_;%mV?vRk8=ZnS>`XN4H4^6nu_v~)wT=_9y|3YEmVneW{@-FS| zom}dIO1Ace%CM45t<%(_lTIiX)w)T7w26YWIar-tc2Cdz_NTnYg)vWo zxz2Ru<1?*OR51=TA%0A@bdqdKfc#)|(aXuF5+UN_AJGpatZzcw1SwZl_>e>85Q*?7 zrbXKD(woDI$Q4}`gT=t9`QpcC0-{D1Uk{nbX{x@MJf;vHQRb&5Z$F7G?x%0St%sAE z%^L91nxsLFX>4l4)ILCcEWH%{f`^#LwE2ndS(%c)d}Zxr60h$l7OpMy8aZaNh)}?^ ziYevAd7x5`DN~v}#_w(Jt=4H!t@k}&0^6moa9BmKK*6we;#14hm6M&QbjS5VIvd;B zY~oHT?=FY6x_ojxKS~FK{m+WGp;+!NK)YX$B)IzvsLi@ScBleINy)@h@SU$8qhtoN z3aqpi$7lyXxZU+V(Y4kXt0iy-pFtx7;@gd9#?;VY#Rt}Afd96z`ZolH8w08+H1=r3 zC6!C6r?q9EXLFejnraSG)YN8y&V{oIo?<{2yb`UL$GGGwNxVVv76B(Pi_<@sEbiA( zgJ)3v(IXk`h(b9J&ts0->?)lWRC#g7aUdzT(>u>x`Q#nwy@>kCAIjQ(epR@I?h7sR z9{^t$KH)`CHCJi@aLibkj%rUYZ*wxFM)DJk#h~kCNtP_vS{lxOswj(bSW)l5sqWZ_ zhX{CtJ8*IU6lG7Q};(coDCa-rdyNX@W@yr8u zN@q*u==tu@0CnuG&2ra;X8Lykiu{588Ex(loEdR@9&e1b+1z)WYdTFX&vP0DeyAam zb|ORS$3M|yj_H^${6noI7a-TlK9j}~4O6b@C){f)e#kXNk8QBznR^rRDXQn47eA9d zo4<$c#_Gps0SBN4o9SLBG7&gXW;L7~Q6@B7CupW4!G`%1VWppif>)O;shZ#BjXjzL zGB*n{u4wiG$9Ex#ShNrp{S1DP4i5_$^@ci@byeD;;FaXFe3srd;Aouk6qjm_x;8~= ze-4oRZhr;2D9;UQ4nI{)E>f~tc_r*{iOW?hY_}E^IB-w_i{nxy41Lww&lA403z6m) z^r8mGt8TFVod+`=0b_7~YSXa)D0?&No-wHaI=G-xPhH?{^T}9r2Y%W|Tjl?Bckb~} zplu&7)!M~oD~G4GL+!RI(rign36)AqNRFA9WJ_UYoW{hM8P!UyjeWL75#tbI7!tt3Ka$rAToEAQh zZnSc+Qa2eGLc|e~po-!@?!MV*Py6sDMS$C(5!DNZ8kJyR2J^<-Pi&3j!8nLErXeIi z-FG5IjpMBJYS7jC9_TG(+IyF^Q9ANIoB;Z22zoawT|VV$v(?) zV8Xi-@jX6^E-c3(Kq%f1Q1GbvBOb(t zw1`#Kr$B@KhcnJ3(ued&G~zzBEyFGDaU{?Sp_<1iWYn9d0BH14_OuNxKQDE@i5o6mJQ zX4WfVis+tqX*K{UeA(L@ypLlL>#p&ia8#jeB~DRx`u_l)@QBlpA3>Fp%+ap`mC_4I zmM_2^_U-{B%2WS}#+8K1Urcz3ydWzoLcORYbyO-Bq- zg(wU5g=B|>ib;?1ITjfjb9W%`pg-BS5GpafF^`XEi0$1_=JH{Oiw;hA4u1uT#98?a zI)Y^3n92`OsvXoP=NU1$0rbl>*UvrgUX!YHg>E4F2RO%IHpUREmJ{qR>i#Jx2CGN; zbGMvc<9M8xt||=$QW&qx#Am*G-&}lb?c;T*xmU%i-1V65LgpI~q-5y2Y4HEyx}{(M zsv+HTNh4vcXV%|`Qg44fhW!H*Fc0xa4f25VHa}*pdKf6Uo`WLQC_t>BM?SC7-^D9I zGgpJm+d7oO@8KHp#H($8GCXFJCD)DB<+kk)edCggx_W8!TWo*FXy27{>QWh1=i(YB{G=s)^s7`NG+_I)X{dMK!>I!BA$ zzaoiJUMCs6TncWn|7%Djg;$i)*3x7lZ%a`&W1yVJDeCfpQWtVvmfk}A1-*>{*l^)_ z1!`~X=?w&>FM}34#@SnN{@V705GM&}UJwKHjpteqNRXin*9C2Oj{D7YE3ahN$o3TQ z@SNo45sK}l)OjmgtDpE??Cn2U?8Gy~b;UFvUFz(B>?BbymIzI@%LbdZ$Q`VeA}^f1 z_`Z1Z&k-s2a60v!Fm>+~*vv}9isNnvD$98jL2o-=;O*?X1nzyM7W2QJa(Y+Y_+LY< zx)2S`@xkVh?>ZoCfqrjMt20H`58WPgO)TSxV&g*t&;w>OigK5~9fgwY5;TeOfV0-( zDt)6~cyLou>1|niY3jF%7V_1mJye>Y1Ym5vo2ZqyaU4NU|J2i0vcApl!!z*e2Tovr zp*_740~7NPvU3@N`P4UT9dmtg#Q@EHUIsjTgoOV5R6$8_@~hzcIgQm-36DpTkW~9& zwE2kAD37fldmqctzi}Uqdys@s66POrYMy#AH`L}2vLss4xOICmR)X;7@Y%kt6nio5 zV)H#I{o8~pjtTBxJE#N4-7D{7Et-)fl6_Y`S8TMHSu1UF;Bb^iigIM^6KywEJ%rF` zO?fkBvyw`7WG8~OPi^a@l_YW z5TBjz{erV>`6kJ0srMdJaIPjKmHv$*u>-@FKYWGj?*J3}0fradb2-gEAxKXcbPGN+ z-A=KG97jy#+58-;Hx*I<&(p{xsW3_9tb;RHh}t*Pb`j2NT&mxlL*SR+AaRx9s$Qc5 z^K|n~6N#7mtxq%Vhfj1*OjfNYNNfg^XLHv)je38S7QKvwhlMix{7M2sYrU3^dy8>i z&EHEIDV+u(W`0LEn+tv1Luc3Bt8sJ_9_?E3CyVYW1h;F76`#t_bG6naLf`GZjGQ=~ z$-ZY)IKPi|t3+(+9?G|-Tpd)WjPaR_lo{~CYmcC%1XzZc-A3HEKDOZ&cONn#RoKHD zJwRPBwAs<(Y^1_m{?9%oQQqX*(7VkwKg*rv>Z(T8aYM|gQ38L%=2qAc+}(86+C6l+ z?zItf-QG|L^^j3;`gvF1LG0jNny6%-Bb~JJ!lyX320CiL_T19@E6kIQKt}uqr?BoX zsbj7+iZa|UHZMXS_>gF$#_p!`_U@tQ74uvJETewKKUJS%-AJ=_Lf$0B;B&VFV;DqZ zs!GP}zo9*V|8=%vEt^g7Goh5Lmu;_Twbd5h7;Qr3zd>Qmc1GUhUDu?gkS)W^wlM9= zT0`qzET$8@2dORzaLyf;0dKJ0oUgZ7#CVx5qJh|z2lX5ogL(|=6NfiYhYMzAsN8TS zu)kD{*EZn9tDJH;o^cX!LETtZJD&6PM@zsnJ9Q^dJceI#o!87<~%;4c(x#t zk$zlt`$_+~n6Mu}ed>4EBbJK!xQnSU-cp;eq+$2Hxg`o+6fWf=0$OKAnbqNr7BV7A zXy4Jgs!%1-om;+2JOspK&B#ZgqMcLLhZ*_ z=7a|jy=qT1ejU?~gsWefT;&F*qE~QKfxZocNu1A54VK|iF(mYZYsLmq)Bj{#esv#3 zOQhOPHJ#54DsEf2{L#uvZc$02CA@eDDx}HY5qT4ep3Ci72J8$w^zOwk=PeJ6nHimv z4aFLS_;?)tDx@o>9dqGsZfF#Yc>HB==8BqXC*}AM$HZCxZaj-1eZT=O5JTPsNoziG zv_6^l{raC(4KO}h$Chtqb4?|gmI$KY_O3H~x^{xTUU{Sl{Z!uOucaX`J7Z%jI|@@Y zb8!Lm1Ai0nm`|FD6yP&GHsJcz>}yfVFoiVn(ykJjC! z)~@^zyCDaU8B7S1ob`pBOmEAb^7R0t+9|0rQtPV~=A0kIy!<6~Zn^hx6NGT#kEDeo zW$4eiAdHHyiHEoaG`h{nyU2VF2O1z3^ zbmHmiv81Dx4?D=+5jKQ#a`U6X)#R<_t=;qZJ(-ux7Tzd9# zo(pJdm?o1b4^0PizE}Z`5w*fDcQuQiw_0M2i)RF{s`m(QbaadIgcS~fj?VsY73OcS z4aY{6aTCaRpVx?MGhsH}QL*|Ky9Ox!X08$WeCT6Qfo&y>tJt=zS!uLRdt<*PB#+>7 zJ-6`~gPOWbif-!_9UCxgu^r5 zRK6IB6@>AV4msw7^P@_#fD^^&Dn<^nj40ExCWK{Bf`y`iT?;ua6&Z)4nLR&$ycp#f zO-9R8SR!MEl22AkD{MU5TFyIRr7m~b_w>87GS{V?ah>7aY-cmizP#lUU+_u_@y&eD zk{RuY*PzdjJ=IK2@<*06YntFc|4h!G`?5ncP}k7 zay<$dwW1Ocfh1P=%Rw@yHfHrTEBr>UB}yURfNDZ&hFEzM=1y7_K|VUfJ4RK#%DW@f zP=%4rXw$ithHTrtR$-+X&Eh&uj+Wc?RK&Bql9pvB5~0#hE9gSbBT8l)U~zobOaz=6 zOkVhs@L={!ts7=n2BpKp)21d zRS9&gLVld@!YfQXVo$Y1e)i!#oV@q33VA&8N7fYW1mI(=rsvifAZ3f4@?))yV0K?6tApL+1G(K1cuWjI z@)>xx`Y@uHRNHVpW?&IO9x_({ZyksBkcoqSc+YR)nYcx5x`XRNPSO6U07^~qK61|; zY0I2JxIQ2TYy58+|Twk1~()Z$kP-(U&$s@kW z??6kcR2ex8`+p&5u=D9NG%2wRCKLBjPsab^|#7S zAYFFnA{VCXvhS=XM0)d$s;>8LE&54tk$)L-dI+8wtJvPHbrbO-OMHucxHdp3j^k$$ zNfhcS81*IEA&e}By;>;~*rV>Hn}R4hc#%5tjLIN|rA`nOu}z$1^R%fR9{*t%(G=!V zzMio84sK}f{HDd67rWOOA>P&iibJ&Fg-WT*)Ah?#O2LBi?5DmCx8Pi58R+%jVCc6W z6bAf6)O2?q+^IABX!#EQ#7DhobPas=R!;r7I|#%Ml$ExE6a9lfzrWtNk>C~d*JtkE zMEHL6{op}80UHrVFvP(L7r6VnDIEg zPoM!?Sjgj0R38hE1a{O*AV*(%h1ab(m5ftmbpT~Qo-GOzpt#MFtw&!%BH1Oo9$WhL zl}+HH6n7t?;=CO8u!N{j<@|(?mNuBM?j}unf_NK?h`wgO;5#B zEiy@5?RD3sN%H`nIONc9nGMHCC=CJG;{}l7?J!~T9VmjWjv#4#YWY{H63qvty%EA}n7h&o4Au#FLKLV5jZQgc(le1>-QDmds+^C=un_F4^YsHm{+ z&|z#0Q%wt^!u_6SZx&y%v;HX2R1OPJ58tv%?O-%1{KhsI7bKmJ4iRZn$BYz}2?i;@ z=JNUazry#X$CJn1xd&iqTk*rKj;EXnowttKk8-4U)j82kXpFg-cy%7Kw*fD&;NVNl z>AX1+RrP+jaWO3xel@~qWpfg%s|Z+npc+>0FurNMQ7a5@y9QFGrNl=jo8H%%LDj|u zYBk;YwQ9w#Sb%`q7B)KG%&4t?)AFJPV}9j_13B&uVeG=u)wKydc8QU7zjKp;Q8a_b z)d|jdxS&a2Zh67C%*HJVLtkwW^$}*UZoE0W)4a6GRfIJ__Djvj3Ux0^pCe8{76g`c zMb!FW%F5~2=PHzjH}6GXZ>I^TzrW2ZqVqPY=go{!RYfGa&%t=!`^cC#R&kdVveu^$ zdd|V~hHv9a@ks7`xMj#Wk^ZkS1u`v-^oZ9_CcKGs&B0#;qTum_N{i)|q~}H$89tVF z+p)3%6511Y?-zCGskqdsu$nUf8CZ62*V5)a_jnsPKyogjmB*ltdrMJ{vz# z4!|CeiI)z0GDFEl(%8Rbf!Y^*tJZOBg%60Z-fnHqBQ3~&U_S#_IlD?ur+ z$Fsa*t1dDYaE74Z_m`z0sNl_EQ90!cL1c0V`u0f1pH;b21KTK50lkD`!_*c~5mtJ~ z-YzT;D4wjEu~;d4`>!+BtVwhLRX$LvY@eb|i$_9$DE|M6E7GcJ Date: Fri, 27 Feb 2026 16:38:13 +0100 Subject: [PATCH 2/7] image weekly --- demos/code_local_assistant/README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/demos/code_local_assistant/README.md b/demos/code_local_assistant/README.md index 1cfb37ca57..ea8709b746 100644 --- a/demos/code_local_assistant/README.md +++ b/demos/code_local_assistant/README.md @@ -76,7 +76,7 @@ ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-cw- ```bash mkdir -p models docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ - openvino/model_server:latest \ + openvino/model_server:weekly \ --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct ``` > **Note:** For deployment, the model requires ~16GB disk space and recommended 19GB+ of VRAM on the GPU. @@ -87,7 +87,7 @@ docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u): ```bash mkdir c:\models docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ - openvino/model_server:latest \ + openvino/model_server:weekly \ --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct ``` > **Note:** For deployment, the model requires ~16GB disk space and recommended 34GB+ of VRAM on the GPU. @@ -98,7 +98,7 @@ docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u): ```bash mkdir c:\models docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ - openvino/model_server:latest \ + openvino/model_server:weekly \ --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int4 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B ``` > **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. @@ -109,7 +109,7 @@ docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u): ```bash mkdir c:\models docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ - openvino/model_server:latest \ + openvino/model_server:weekly \ --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B ``` > **Note:** For deployment, the model requires ~24GB disk space and recommended 28GB+ of VRAM on the GPU. @@ -120,7 +120,7 @@ docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u): ```bash mkdir c:\models docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ - openvino/model_server:latest \ + openvino/model_server:weekly \ --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --model_name Qwen3-8B ``` > **Note:** For deployment, the model requires ~4GB disk space and recommended 6GB+ of VRAM on the GPU. @@ -130,7 +130,7 @@ docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u): ```bash mkdir c:\models docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/accel --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ - openvino/model_server:latest-gpu \ + openvino/model_server:weekly \ --model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-cw-ov --task text_generation --target_device NPU --tool_parser hermes3 --rest_port 8000 --max_prompt_len 16384 --plugin_config '{"NPUW_LLM_PREFILL_ATTENTION_HINT":"PYRAMID"}' --model_name Qwen3-8B ``` > **Note:** First model initialization might be long. With the compilation cache, sequential model loading will be fast. From faccd505ae096ba0e299aa9a596a70b2efa5569b Mon Sep 17 00:00:00 2001 From: Dariusz Trawinski Date: Fri, 27 Feb 2026 16:41:44 +0100 Subject: [PATCH 3/7] review --- demos/code_local_assistant/README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/demos/code_local_assistant/README.md b/demos/code_local_assistant/README.md index ea8709b746..1a9cb793f4 100644 --- a/demos/code_local_assistant/README.md +++ b/demos/code_local_assistant/README.md @@ -14,7 +14,7 @@ With the rise of AI PC capabilities, hosting own Visual Studio code assistant is ::::{tab-set} :::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 :sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 -```bash +```bat mkdir c:\models set MOE_USE_MICRO_GEMM_PREFILL=0 # temporary workaround to improve accuracy with long context ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct @@ -24,7 +24,7 @@ ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A :::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 :sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 -```bash +```bat mkdir c:\models set MOE_USE_MICRO_GEMM_PREFILL=0 # temporary workaround to improve accuracy with long context ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct @@ -34,7 +34,7 @@ ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A :::{tab-item} OpenVINO/gpt-oss-20B-int4 :sync: OpenVINO/gpt-oss-20B-int4 -```bash +```bat mkdir c:\models ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int4 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20B ``` @@ -43,7 +43,7 @@ ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int4 :::{tab-item} OpenVINO/gpt-oss-20B-int8 :sync: OpenVINO/gpt-oss-20B-int8 -```bash +```bat mkdir c:\models ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20B ``` @@ -52,7 +52,7 @@ ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int8 :::{tab-item} OpenVINO/Qwen3-8B-int4-ov :sync: OpenVINO/Qwen3-8B-int4-ov -```bash +```bat mkdir c:\models ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-8B ``` @@ -60,7 +60,7 @@ ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov ::: :::{tab-item} OpenVINO/Qwen3-8B-int4-cw-ov :sync: OpenVINO/Qwen3-8B-int4-cw-ov -```bash +```bat mkdir c:\models ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-cw-ov --task text_generation --target_device NPU --tool_parser hermes3 --rest_port 8000 --max_prompt_len 16384 --plugin_config "{\"NPUW_LLM_PREFILL_ATTENTION_HINT\":\"PYRAMID\"}" --cache_dir .ovcache --model_name Qwen3-8B ``` @@ -142,7 +142,7 @@ docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/model Models which are not published in OpenVINO format can be exported and quantized with custom parameters. Below is an example how to export and deploy model Devstral-Small-2507. -```bash +``` mkdir models python export_model.py text_generation --source_model unsloth/Devstral-Small-2507 --weight-format int4 --config_file_path models/config_all.json --model_repository_path models --tool_parser devstral --target_device GPU curl -L -o models/unsloth/Devstral-Small-2507/chat_template.jinja https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/extras/chat_template_examples/chat_template_devstral.jinja @@ -359,6 +359,6 @@ Example use cases for tools: ![glob](./glob.png) -* Extending VRAM allocation to iGPU +* Extending VRAM allocation to iGPU to enable loading bigger models ![xram](./vram.png) From e15aff129e3dd61df83601cb30ffdd1c6dfb4ce7 Mon Sep 17 00:00:00 2001 From: "Trawinski, Dariusz" Date: Mon, 2 Mar 2026 22:51:51 +0100 Subject: [PATCH 4/7] Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com> --- demos/code_local_assistant/README.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/demos/code_local_assistant/README.md b/demos/code_local_assistant/README.md index 1a9cb793f4..bcc060a3a0 100644 --- a/demos/code_local_assistant/README.md +++ b/demos/code_local_assistant/README.md @@ -6,7 +6,7 @@ With the rise of AI PC capabilities, hosting own Visual Studio code assistant is # Requirements - Windows (for standalone app) or Linux (using Docker) - Python installed (for model preparation only) -- Intel Meteor Lake, Lunar Lake, Arrow Lake or Panter Lake. +- Intel Meteor Lake, Lunar Lake, Arrow Lake or Panther Lake. - Memory requirements depend on the model size ### Windows: deploying on bare metal @@ -17,7 +17,7 @@ With the rise of AI PC capabilities, hosting own Visual Studio code assistant is ```bat mkdir c:\models set MOE_USE_MICRO_GEMM_PREFILL=0 # temporary workaround to improve accuracy with long context -ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct +ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4-ov --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct ``` > **Note:** For deployment, the model requires ~16GB disk space and recommended 19GB+ of VRAM on the GPU. ::: @@ -27,7 +27,7 @@ ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A ```bat mkdir c:\models set MOE_USE_MICRO_GEMM_PREFILL=0 # temporary workaround to improve accuracy with long context -ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct +ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct ``` > **Note:** For deployment, the model requires ~16GB disk space and recommended 34GB+ of VRAM on the GPU. ::: @@ -75,20 +75,20 @@ ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-cw- :sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 ```bash mkdir -p models -docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ +docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \ openvino/model_server:weekly \ - --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct + --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4-ov --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct ``` > **Note:** For deployment, the model requires ~16GB disk space and recommended 19GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/QwCoder-30B-A3B-Instruct-int8 +:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 :sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 ```bash -mkdir c:\models +mkdir -p models docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ openvino/model_server:weekly \ - --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct + --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8-ov --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct ``` > **Note:** For deployment, the model requires ~16GB disk space and recommended 34GB+ of VRAM on the GPU. ::: @@ -96,7 +96,7 @@ docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u): :::{tab-item} OpenVINO/gpt-oss-20B-int4 :sync: OpenVINO/gpt-oss-20B-int4 ```bash -mkdir c:\models +mkdir -p models docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ openvino/model_server:weekly \ --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int4 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B @@ -107,7 +107,7 @@ docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u): :::{tab-item} OpenVINO/gpt-oss-20B-int8 :sync: OpenVINO/gpt-oss-20B-int8 ```bash -mkdir c:\models +mkdir -p models docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ openvino/model_server:weekly \ --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B @@ -119,16 +119,16 @@ docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u): :sync: OpenVINO/Qwen3-8B-int4-ov ```bash mkdir c:\models -docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ +docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \ openvino/model_server:weekly \ - --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --model_name Qwen3-8B + --model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --reasoning_parser qwen3 --rest_port 8000 --model_name Qwen3-8B ``` > **Note:** For deployment, the model requires ~4GB disk space and recommended 6GB+ of VRAM on the GPU. ::: :::{tab-item} OpenVINO/Qwen3-8B-int4-cw-ov :sync: OpenVINO/Qwen3-8B-int4-cw-ov ```bash -mkdir c:\models +mkdir -p models docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/accel --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ openvino/model_server:weekly \ --model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-cw-ov --task text_generation --target_device NPU --tool_parser hermes3 --rest_port 8000 --max_prompt_len 16384 --plugin_config '{"NPUW_LLM_PREFILL_ATTENTION_HINT":"PYRAMID"}' --model_name Qwen3-8B From 4d4a599cde3f99d2536df569097d21388c602001 Mon Sep 17 00:00:00 2001 From: Dariusz Trawinski Date: Mon, 2 Mar 2026 23:01:03 +0100 Subject: [PATCH 5/7] fixes --- demos/code_local_assistant/README.md | 50 ++++++++++++++-------------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/demos/code_local_assistant/README.md b/demos/code_local_assistant/README.md index bcc060a3a0..a34a652453 100644 --- a/demos/code_local_assistant/README.md +++ b/demos/code_local_assistant/README.md @@ -12,8 +12,8 @@ With the rise of AI PC capabilities, hosting own Visual Studio code assistant is ### Windows: deploying on bare metal ::::{tab-set} -:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 -:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 +:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4-ov +:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4-ov ```bat mkdir c:\models set MOE_USE_MICRO_GEMM_PREFILL=0 # temporary workaround to improve accuracy with long context @@ -22,30 +22,30 @@ ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A > **Note:** For deployment, the model requires ~16GB disk space and recommended 19GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 -:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 +:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8-ov +:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8-ov ```bat mkdir c:\models set MOE_USE_MICRO_GEMM_PREFILL=0 # temporary workaround to improve accuracy with long context -ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct +ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8-ov --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --cache_dir .ovcache --model_name Qwen3-Coder-30B-A3B-Instruct ``` > **Note:** For deployment, the model requires ~16GB disk space and recommended 34GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/gpt-oss-20B-int4 -:sync: OpenVINO/gpt-oss-20B-int4 +:::{tab-item} OpenVINO/gpt-oss-20b-int4-ov +:sync: OpenVINO/gpt-oss-20b-int4-ov ```bat mkdir c:\models -ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int4 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20B +ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20b-int4-ov --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20b ``` > **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/gpt-oss-20B-int8 -:sync: OpenVINO/gpt-oss-20B-int8 +:::{tab-item} OpenVINO/gpt-oss-20b-int8-ov +:sync: OpenVINO/gpt-oss-20b-int8-ov ```bat mkdir c:\models -ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20B-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20B +ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20b-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20b ``` > **Note:** For deployment, the model requires ~24GB disk space and recommended 28GB+ of VRAM on the GPU. ::: @@ -71,8 +71,8 @@ ovms --model_repository_path c:\models --source_model OpenVINO/Qwen3-8B-int4-cw- ### Linux: via Docker ::::{tab-set} -:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 -:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4 +:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4-ov +:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4-ov ```bash mkdir -p models docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \ @@ -82,35 +82,35 @@ docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u): > **Note:** For deployment, the model requires ~16GB disk space and recommended 19GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 -:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8 +:::{tab-item} OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8-ov +:sync: OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8-ov ```bash mkdir -p models -docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ +docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \ openvino/model_server:weekly \ --model_repository_path /models --source_model OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int8-ov --task text_generation --target_device GPU --tool_parser qwen3coder --rest_port 8000 --model_name Qwen3-Coder-30B-A3B-Instruct ``` > **Note:** For deployment, the model requires ~16GB disk space and recommended 34GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/gpt-oss-20B-int4 -:sync: OpenVINO/gpt-oss-20B-int4 +:::{tab-item} OpenVINO/gpt-oss-20B-int4-ov +:sync: OpenVINO/gpt-oss-20B-int4-ov ```bash mkdir -p models -docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ +docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \ openvino/model_server:weekly \ - --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int4 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B + --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int4-ov --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B ``` > **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/gpt-oss-20B-int8 -:sync: OpenVINO/gpt-oss-20B-int8 +:::{tab-item} OpenVINO/gpt-oss-20b-int8-ov +:sync: OpenVINO/gpt-oss-20B-int8-ov ```bash mkdir -p models -docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ +docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ openvino/model_server:weekly \ - --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B + --model_repository_path /models --source_model OpenVINO/gpt-oss-20b-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20b ``` > **Note:** For deployment, the model requires ~24GB disk space and recommended 28GB+ of VRAM on the GPU. ::: @@ -129,7 +129,7 @@ docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/model :sync: OpenVINO/Qwen3-8B-int4-cw-ov ```bash mkdir -p models -docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/accel --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ +docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/accel --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \ openvino/model_server:weekly \ --model_repository_path /models --source_model OpenVINO/Qwen3-8B-int4-cw-ov --task text_generation --target_device NPU --tool_parser hermes3 --rest_port 8000 --max_prompt_len 16384 --plugin_config '{"NPUW_LLM_PREFILL_ATTENTION_HINT":"PYRAMID"}' --model_name Qwen3-8B ``` From 111b71b92f8f250638f0b858ddf260b1dd6f3d1d Mon Sep 17 00:00:00 2001 From: Dariusz Trawinski Date: Mon, 2 Mar 2026 23:04:43 +0100 Subject: [PATCH 6/7] fixes --- demos/code_local_assistant/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/demos/code_local_assistant/README.md b/demos/code_local_assistant/README.md index a34a652453..aa615881ff 100644 --- a/demos/code_local_assistant/README.md +++ b/demos/code_local_assistant/README.md @@ -45,7 +45,7 @@ ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20b-int4- :sync: OpenVINO/gpt-oss-20b-int8-ov ```bat mkdir c:\models -ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20b-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20b +ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20b-int8-ov --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20b ``` > **Note:** For deployment, the model requires ~24GB disk space and recommended 28GB+ of VRAM on the GPU. ::: @@ -99,7 +99,7 @@ docker run -d -p 8000:8000 --rm -e MOE_USE_MICRO_GEMM_PREFILL=0 --user $(id -u): mkdir -p models docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \ openvino/model_server:weekly \ - --model_repository_path /models --source_model OpenVINO/gpt-oss-20B-int4-ov --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20B + --model_repository_path /models --source_model OpenVINO/gpt-oss-20b-int4-ov --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20b ``` > **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. ::: @@ -110,7 +110,7 @@ docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/model mkdir -p models docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ openvino/model_server:weekly \ - --model_repository_path /models --source_model OpenVINO/gpt-oss-20b-int8 --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20b + --model_repository_path /models --source_model OpenVINO/gpt-oss-20b-int8-ov --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20b ``` > **Note:** For deployment, the model requires ~24GB disk space and recommended 28GB+ of VRAM on the GPU. ::: From ed1e9ba3fd0f42701f267b559ec933414d1aa62a Mon Sep 17 00:00:00 2001 From: Dariusz Trawinski Date: Tue, 3 Mar 2026 14:38:54 +0100 Subject: [PATCH 7/7] remove gpt-oss-int8 --- demos/code_local_assistant/README.md | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/demos/code_local_assistant/README.md b/demos/code_local_assistant/README.md index aa615881ff..d1f491375e 100644 --- a/demos/code_local_assistant/README.md +++ b/demos/code_local_assistant/README.md @@ -41,15 +41,6 @@ ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20b-int4- > **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/gpt-oss-20b-int8-ov -:sync: OpenVINO/gpt-oss-20b-int8-ov -```bat -mkdir c:\models -ovms --model_repository_path c:\models --source_model OpenVINO/gpt-oss-20b-int8-ov --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --cache_dir .ovcache --model_name gpt-oss-20b -``` -> **Note:** For deployment, the model requires ~24GB disk space and recommended 28GB+ of VRAM on the GPU. -::: - :::{tab-item} OpenVINO/Qwen3-8B-int4-ov :sync: OpenVINO/Qwen3-8B-int4-ov ```bat @@ -104,17 +95,6 @@ docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/model > **Note:** For deployment, the model requires ~12GB disk space and recommended 16GB+ of VRAM on the GPU. ::: -:::{tab-item} OpenVINO/gpt-oss-20b-int8-ov -:sync: OpenVINO/gpt-oss-20B-int8-ov -```bash -mkdir -p models -docker run -d -p 8000:8000 --rm --user $(id -u):$(id -g) -v $(pwd)/models:/models/:rw --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \ - openvino/model_server:weekly \ - --model_repository_path /models --source_model OpenVINO/gpt-oss-20b-int8-ov --task text_generation --target_device GPU --tool_parser gptoss --reasoning_parser gptoss --rest_port 8000 --model_name gpt-oss-20b -``` -> **Note:** For deployment, the model requires ~24GB disk space and recommended 28GB+ of VRAM on the GPU. -::: - :::{tab-item} OpenVINO/Qwen3-8B-int4-ov :sync: OpenVINO/Qwen3-8B-int4-ov ```bash