Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions docs/pull_optimum_cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Using `--pull` parameter, we can use OVMS to download the model, quantize and co
**Required:** Docker Engine installed

```text
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest-py --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> --weight-format int8 --task <task> [TASK_SPECIFIC_PARAMETERS]
docker run -u $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest-py --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> --weight-format int8 --task <task> [TASK_SPECIFIC_PARAMETERS]
```
:::

Expand All @@ -44,18 +44,19 @@ ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_rep
:::
::::

Example for pulling `Qwen/Qwen3-8B`:
Example for pulling `Qwen/Qwen3-4B`:

```bat
ovms --pull --source_model "Qwen/Qwen3-8B" --model_repository_path /models --model_name Qwen3-8B --target_device CPU --task text_generation --weight-format int8
ovms --pull --source_model "Qwen/Qwen3-4B" --model_repository_path /models --model_name Qwen3-4B --target_device CPU --task text_generation --weight-format int8
```
::::{tab-set}
:::{tab-item} With Docker
:sync: docker
**Required:** Docker Engine installed

```text
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest-py --pull --source_model "Qwen/Qwen3-8B" --model_repository_path /models --model_name Qwen3-8B --task text_generation --weight-format int8
```bash
mkdir -p models
docker run -u $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest-py --pull --source_model "Qwen/Qwen3-4B" --model_repository_path /models --model_name Qwen3-4B --task text_generation --weight-format int8
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker example under “Example for pulling Qwen/Qwen3-8B” pulls Qwen/Qwen3-4B and sets --model_name Qwen3-4B, which conflicts with the section title and the adjacent baremetal example that still uses Qwen/Qwen3-8B. Please make the example consistent (either update the example back to 8B or change the surrounding text and other commands to 4B).

Suggested change
docker run -u $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest-py --pull --source_model "Qwen/Qwen3-4B" --model_repository_path /models --model_name Qwen3-4B --task text_generation --weight-format int8
docker run -u $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest-py --pull --source_model "Qwen/Qwen3-8B" --model_repository_path /models --model_name Qwen3-8B --task text_generation --weight-format int8

Copilot uses AI. Check for mistakes.
```
:::

Expand All @@ -64,7 +65,7 @@ docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino
**Required:** OpenVINO Model Server package - see [deployment instructions](./deploying_server_baremetal.md) for details.

```bat
ovms --pull --source_model "Qwen/Qwen3-8B" --model_repository_path /models --model_name Qwen3-8B --task text_generation --weight-format int8
ovms --pull --source_model "Qwen/Qwen3-4B" --model_repository_path /models --model_name Qwen3-4B --task text_generation --weight-format int8
```
:::
::::
Expand All @@ -84,10 +85,10 @@ You can mount the HuggingFace cache to avoid downloading the original model in c
Below is an example pull command with optimum model cache directory sharing for model download:

```bash
docker run -v /etc/passwd:/etc/passwd -e HF_HOME=/hf_home/cache --user $(id -u):$(id -g) --group-add=$(id -g) -v ${HOME}/.cache/huggingface/:/hf_home/cache -v $(pwd)/models:/models:rw openvino/model_server:latest-py --pull --model_repository_path /models --source_model meta-llama/Meta-Llama-3-8B-Instruct --task text_generation --weight-format int8
docker run -v /etc/passwd:/etc/passwd -e HF_HOME=/hf_home/cache --user $(id -u):$(id -g) --group-add=$(id -g) -v ${HOME}/.cache/huggingface/:/hf_home/cache -v $(pwd)/models:/models:rw openvino/model_server:latest-py --pull --model_repository_path /models --source_model meta-llama/Llama-3.2-1B-Instruct --task text_generation --weight-format int8
```

or deploy without caching the model files with passed HF_TOKEN for authorization:
```bash
docker run -p 8000:8000 -e HF_TOKEN=$HF_TOKEN openvino/model_server:latest-py --model_repository_path /tmp --source_model meta-llama/Meta-Llama-3-8B-Instruct --task text_generation --weight-format int8 --target_device CPU --rest_port 8000
docker run -p 8000:8000 -e HF_TOKEN=$HF_TOKEN openvino/model_server:latest-py --model_repository_path /tmp --source_model meta-llama/Llama-3.2-1B-Instruct --task text_generation --weight-format int8 --target_device CPU --rest_port 8000
```