diff --git a/docs/pull_optimum_cli.md b/docs/pull_optimum_cli.md index 181841bdaa..0c046d3932 100644 --- a/docs/pull_optimum_cli.md +++ b/docs/pull_optimum_cli.md @@ -32,7 +32,7 @@ Using `--pull` parameter, we can use OVMS to download the model, quantize and co **Required:** Docker Engine installed ```text -docker run $(id -u):$(id -g) --rm -v :/models:rw openvino/model_server:latest-py --pull --source_model --model_repository_path /models --model_name --target_device --weight-format int8 --task [TASK_SPECIFIC_PARAMETERS] +docker run -u $(id -u):$(id -g) --rm -v :/models:rw openvino/model_server:latest-py --pull --source_model --model_repository_path /models --model_name --target_device --weight-format int8 --task [TASK_SPECIFIC_PARAMETERS] ``` ::: @@ -44,18 +44,19 @@ ovms --pull --source_model --model_repository_path :/models:rw openvino/model_server:latest-py --pull --source_model "Qwen/Qwen3-8B" --model_repository_path /models --model_name Qwen3-8B --task text_generation --weight-format int8 +```bash +mkdir -p models +docker run -u $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest-py --pull --source_model "Qwen/Qwen3-4B" --model_repository_path /models --model_name Qwen3-4B --task text_generation --weight-format int8 ``` ::: @@ -64,7 +65,7 @@ docker run $(id -u):$(id -g) --rm -v :/models:rw openvino **Required:** OpenVINO Model Server package - see [deployment instructions](./deploying_server_baremetal.md) for details. ```bat -ovms --pull --source_model "Qwen/Qwen3-8B" --model_repository_path /models --model_name Qwen3-8B --task text_generation --weight-format int8 +ovms --pull --source_model "Qwen/Qwen3-4B" --model_repository_path /models --model_name Qwen3-4B --task text_generation --weight-format int8 ``` ::: :::: @@ -84,10 +85,10 @@ You can mount the HuggingFace cache to avoid downloading the original model in c Below is an example pull command with optimum model cache directory sharing for model download: ```bash -docker run -v /etc/passwd:/etc/passwd -e HF_HOME=/hf_home/cache --user $(id -u):$(id -g) --group-add=$(id -g) -v ${HOME}/.cache/huggingface/:/hf_home/cache -v $(pwd)/models:/models:rw openvino/model_server:latest-py --pull --model_repository_path /models --source_model meta-llama/Meta-Llama-3-8B-Instruct --task text_generation --weight-format int8 +docker run -v /etc/passwd:/etc/passwd -e HF_HOME=/hf_home/cache --user $(id -u):$(id -g) --group-add=$(id -g) -v ${HOME}/.cache/huggingface/:/hf_home/cache -v $(pwd)/models:/models:rw openvino/model_server:latest-py --pull --model_repository_path /models --source_model meta-llama/Llama-3.2-1B-Instruct --task text_generation --weight-format int8 ``` or deploy without caching the model files with passed HF_TOKEN for authorization: ```bash -docker run -p 8000:8000 -e HF_TOKEN=$HF_TOKEN openvino/model_server:latest-py --model_repository_path /tmp --source_model meta-llama/Meta-Llama-3-8B-Instruct --task text_generation --weight-format int8 --target_device CPU --rest_port 8000 +docker run -p 8000:8000 -e HF_TOKEN=$HF_TOKEN openvino/model_server:latest-py --model_repository_path /tmp --source_model meta-llama/Llama-3.2-1B-Instruct --task text_generation --weight-format int8 --target_device CPU --rest_port 8000 ``` \ No newline at end of file