From b082f1687ba5fc752fbf7f8c0eef79fd5b859142 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 27 Feb 2026 13:48:43 +0000 Subject: [PATCH 1/3] Initial plan From 332789e5cafa91e9b3bbfd655e62100067dcb09b Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 27 Feb 2026 13:56:15 +0000 Subject: [PATCH 2/3] Fix spelling, grammar, and content errors in docs/ markdown files - docs/llm/reference.md: it's -> its (possessive), 'Seconds part' -> 'Second part', 'there is not need' -> 'there is no need' - docs/security_considerations.md: 'be default' -> 'by default' - docs/starting_server.md: 'two method for' -> 'two methods for' - docs/speech_recognition/reference.md: wrong section heading 'Text to speech calculator limitations' -> 'Speech to text calculator limitations' - docs/models_repository_graph.md: 'of behaves' -> 'it behaves', 'also require' -> 'also requires', 'In can the graph' -> 'In case the graph' - docs/legacy.md: '[Learn model]' -> '[Learn more]', 'not support' -> 'not supported' - docs/advanced_topics.md: 'not support' -> 'not supported' - docs/deploying_server_kubernetes.md: fix double-parentheses in markdown link - docs/mediapipe.md: fix table wrong input/output column values, fix mismatched backtick quote, 'Beside' -> 'Besides' - docs/performance_tuning.md: 'effectively results with' -> 'effectively result in' - docs/clients_genai.md: 'Beside' -> 'Besides' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/advanced_topics.md | 2 +- docs/clients_genai.md | 2 +- docs/deploying_server_kubernetes.md | 2 +- docs/legacy.md | 4 ++-- docs/llm/reference.md | 6 +++--- docs/mediapipe.md | 8 ++++---- docs/models_repository_graph.md | 6 +++--- docs/performance_tuning.md | 2 +- docs/security_considerations.md | 2 +- docs/speech_recognition/reference.md | 2 +- docs/starting_server.md | 2 +- 11 files changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/advanced_topics.md b/docs/advanced_topics.md index 664d924caa..513d4277ed 100644 --- a/docs/advanced_topics.md +++ b/docs/advanced_topics.md @@ -13,7 +13,7 @@ ovms_extras_nginx-mtls-auth-readme ``` ## CPU Extensions -Implement any CPU layer, that is not support by OpenVINO yet, as a shared library. +Implement any CPU layer, that is not supported by OpenVINO yet, as a shared library. [Learn more](../src/example/SampleCpuExtension/README.md) diff --git a/docs/clients_genai.md b/docs/clients_genai.md index 2d0799cfed..db990c4a6e 100644 --- a/docs/clients_genai.md +++ b/docs/clients_genai.md @@ -16,7 +16,7 @@ Speech to text API Text to speech API ``` ## Introduction -Beside Tensorflow Serving API (`/v1`) and KServe API (`/v2`) frontends, the model server supports a range of endpoints for generative use cases (`v3`). They are extendible using MediaPipe graphs. +Besides Tensorflow Serving API (`/v1`) and KServe API (`/v2`) frontends, the model server supports a range of endpoints for generative use cases (`v3`). They are extendible using MediaPipe graphs. Currently supported endpoints are: OpenAI compatible endpoints: diff --git a/docs/deploying_server_kubernetes.md b/docs/deploying_server_kubernetes.md index 42417d2f16..264627bffc 100644 --- a/docs/deploying_server_kubernetes.md +++ b/docs/deploying_server_kubernetes.md @@ -61,7 +61,7 @@ Note that using s3 or minio bucket requires configuring credentials like describ ## Deprecation notice about OpenVINO operator -The dedicated [operator for OpenVINO]((https://operatorhub.io/operator/ovms-operator)) is now deprecated. KServe operator can now support all OVMS use cases including generative models. It provides wider set of features and configuration options. Because KServe is commonly used for other serving runtimes, it gives easier transition and transparent migration. +The dedicated [operator for OpenVINO](https://operatorhub.io/operator/ovms-operator) is now deprecated. KServe operator can now support all OVMS use cases including generative models. It provides wider set of features and configuration options. Because KServe is commonly used for other serving runtimes, it gives easier transition and transparent migration. ## Additional Resources diff --git a/docs/legacy.md b/docs/legacy.md index b6504c08bf..7ed1d9474b 100644 --- a/docs/legacy.md +++ b/docs/legacy.md @@ -10,12 +10,12 @@ ovms_docs_dag ``` ## Stateful models -Implement any CPU layer, that is not support by OpenVINO yet, as a shared library. +Implement any CPU layer, that is not supported by OpenVINO yet, as a shared library. [Learn more](./stateful_models.md) **Note:** The use cases from this feature can be addressed in MediaPipe graphs including generative use cases. ## DAG pipelines The Directed Acyclic Graph (DAG) Scheduler for creating pipeline of models for execution in a single client request. -[Learn model](./dag_scheduler.md) +[Learn more](./dag_scheduler.md) **Note:** MediaPipe graphs can be a more flexible of pipelines scheduler which can employ various data formats and accelerators. diff --git a/docs/llm/reference.md b/docs/llm/reference.md index e66e41a1a8..91c5375e1a 100644 --- a/docs/llm/reference.md +++ b/docs/llm/reference.md @@ -2,7 +2,7 @@ ## Overview -With rapid development of generative AI, new techniques and algorithms for performance optimization and better resource utilization are introduced to make best use of the hardware and provide best generation performance. OpenVINO implements those state of the art methods in it's [GenAI Library](https://github.com/openvinotoolkit/openvino.genai) like: +With rapid development of generative AI, new techniques and algorithms for performance optimization and better resource utilization are introduced to make best use of the hardware and provide best generation performance. OpenVINO implements those state of the art methods in its [GenAI Library](https://github.com/openvinotoolkit/openvino.genai) like: - Continuous Batching - Paged Attention - Dynamic Split Fuse @@ -22,7 +22,7 @@ The servable types are: - Visual Language Model Stateful. First part - Language Model / Visual Language Model - determines whether servable accepts only text or both text and images on the input. -Seconds part - Continuous Batching / Stateful - determines what kind of GenAI pipeline is used as the engine. By default CPU and GPU devices work on Continuous Batching pipelines. NPU device works only on Stateful servable type. +Second part - Continuous Batching / Stateful - determines what kind of GenAI pipeline is used as the engine. By default CPU and GPU devices work on Continuous Batching pipelines. NPU device works only on Stateful servable type. User does not have to explicitly select servable type. It is inferred based on model directory contents and selected target device. Model directory contents determine if model can work only with text or visual input as well. As for target device, setting it to `NPU` will always pick Stateful servable, while any other device will result in deploying Continuous Batching servable. @@ -354,7 +354,7 @@ Check [tested models](https://github.com/openvinotoolkit/openvino.genai/blob/mas ### Completions -When sending a request to `/completions` endpoint, model server adds `bos_token_id` during tokenization, so **there is not need to add `bos_token` to the prompt**. +When sending a request to `/completions` endpoint, model server adds `bos_token_id` during tokenization, so **there is no need to add `bos_token` to the prompt**. ### Chat Completions diff --git a/docs/mediapipe.md b/docs/mediapipe.md index 84e070e3d9..bae2dee1d3 100644 --- a/docs/mediapipe.md +++ b/docs/mediapipe.md @@ -65,15 +65,15 @@ Following table lists supported tag and packet types in pbtxt graph definition: |pbtxt line|input/output|tag|packet type|stream name| |:---|:---|:---|:---|:---| |input_stream: "a"|input|none|ov::Tensor|a| -|output_stream: "b"|input|none|ov::Tensor|b| +|output_stream: "b"|output|none|ov::Tensor|b| |input_stream: "IMAGE:a"|input|IMAGE|mediapipe::ImageFrame|a| |output_stream: "IMAGE:b"|output|IMAGE|mediapipe::ImageFrame|b| -|input_stream: "OVTENSOR:a"|output|OVTENSOR|ov::Tensor|a| +|input_stream: "OVTENSOR:a"|input|OVTENSOR|ov::Tensor|a| |output_stream: "OVTENSOR:b"|output|OVTENSOR|ov::Tensor|b| |input_stream: "REQUEST:req"|input|REQUEST|KServe inference::ModelInferRequest|req| |output_stream: "RESPONSE:res"|output|RESPONSE|KServe inference::ModelInferResponse|res| -In case of missing tag OpenVINO Model Server assumes that the packet type is `ov::Tensor'. The stream name can be arbitrary but the convention is to use a lower case word. +In case of missing tag OpenVINO Model Server assumes that the packet type is `ov::Tensor`. The stream name can be arbitrary but the convention is to use a lower case word. The required data layout for the MediaPipe `IMAGE` conversion is HWC and the supported precisions are: |Datatype|Allowed number of channels| @@ -110,7 +110,7 @@ client.async_stream_infer( ``` ### List of default calculators -Beside OpenVINO inference calculators, model server public docker image also includes all the calculators used in the enabled demos. +Besides OpenVINO inference calculators, model server public docker image also includes all the calculators used in the enabled demos. The list of all included calculators, subgraphs, input/output stream handler is reported in the model server is started with extra parameter `--log_level TRACE`. ### CPU and GPU execution diff --git a/docs/models_repository_graph.md b/docs/models_repository_graph.md index 39bae5be37..04d8a629ce 100644 --- a/docs/models_repository_graph.md +++ b/docs/models_repository_graph.md @@ -1,10 +1,10 @@ # Graphs Repository {#ovms_docs_models_repository_graph} Model server can deploy a pipelines of models and nodes for any complex and custom transformations. -From the client perspective of behaves almost like a single model but it more flexible and configurable. +From the client perspective it behaves almost like a single model but it more flexible and configurable. The model repository employing graphs is similar in the structure to [classic models](./models_repository_classic.md). -It needs to include the collection of models used in the pipeline. It also require a MediaPipe graph definition file in .pbtxt format. +It needs to include the collection of models used in the pipeline. It also requires a MediaPipe graph definition file in .pbtxt format. ``` graph_models @@ -21,7 +21,7 @@ graph_models └── config.json ``` -In can the graph includes python nodes, there should be included also a python file with the node implementation. +In case the graph includes python nodes, there should be included also a python file with the node implementation. For more information on how to use MediaPipe graphs, refer to the [article](./mediapipe.md). diff --git a/docs/performance_tuning.md b/docs/performance_tuning.md index 7ae7f38b93..ddf0de20d4 100644 --- a/docs/performance_tuning.md +++ b/docs/performance_tuning.md @@ -146,7 +146,7 @@ $ cpupower frequency-set --min 3.1GHz ## Network Configuration for Optimal Performance -By default, OVMS endpoints are bound to all ipv4 addresses. On same systems, which route localhost name to ipv6 address, it might cause extra time on the client side to switch to ipv4. It can effectively results with extra 1-2s latency. +By default, OVMS endpoints are bound to all ipv4 addresses. On same systems, which route localhost name to ipv6 address, it might cause extra time on the client side to switch to ipv4. It can effectively result in extra 1-2s latency. It can be overcome by switching the API URL to `http://127.0.0.1` on the client side. To optimize network connection performance: diff --git a/docs/security_considerations.md b/docs/security_considerations.md index 43693b0a56..56f33c2a87 100644 --- a/docs/security_considerations.md +++ b/docs/security_considerations.md @@ -33,7 +33,7 @@ OVMS supports multimodal models with image inputs provided as URL. However, to p OpenVINO Model Server has a set of mechanisms preventing denial of service attacks from the client applications. They include the following: - setting the number of inference execution streams which can limit the number of parallel inference calls in progress for each model. It can be tuned with `NUM_STREAMS` or `PERFORMANCE_HINT` plugin config. - setting the maximum number of gRPC threads which is, by default, configured to the number 8 * number_of_cores. It can be changed with the parameter `--grpc_max_threads`. -- setting the maximum number of REST workers which is, be default, configured to the number 4 * number_of_cores. It can be changed with the parameter `--rest_workers`. +- setting the maximum number of REST workers which is, by default, configured to the number 4 * number_of_cores. It can be changed with the parameter `--rest_workers`. - maximum size of REST and GRPC message which is 1GB - bigger messages will be rejected - setting max_concurrent_streams which defines how many concurrent threads can be initiated from a single client - the remaining will be queued. The default is equal to the number of CPU cores. It can be changed with the `--grpc_channel_arguments grpc.max_concurrent_streams=8`. - setting the gRPC memory quota for the requests buffer - the default is 2GB. It can be changed with `--grpc_memory_quota=2147483648`. Value `0` invalidates the quota. diff --git a/docs/speech_recognition/reference.md b/docs/speech_recognition/reference.md index 2bfe69476f..7e7465ebe1 100644 --- a/docs/speech_recognition/reference.md +++ b/docs/speech_recognition/reference.md @@ -59,7 +59,7 @@ The calculator supports the following `node_options` for tuning the pipeline con We recommend using [export script](../../demos/common/export_models/README.md) to prepare models directory structure for serving. Check [supported models](https://openvinotoolkit.github.io/openvino.genai/docs/supported-models/#speech-recognition-models). -### Text to speech calculator limitations +### Speech to text calculator limitations - Streaming is not supported ## References diff --git a/docs/starting_server.md b/docs/starting_server.md index 0709fbb8c1..f9d35a96e6 100644 --- a/docs/starting_server.md +++ b/docs/starting_server.md @@ -1,6 +1,6 @@ # Starting the Server {#ovms_docs_serving_model} -There are two method for passing to the model server information about the models and their configuration: +There are two methods for passing to the model server information about the models and their configuration: - via CLI parameters - for a single model or pipeline - via config file in json format - for any number of models and pipelines From 4928b39d0e67992daa97e0b53f3b0c94d633b9d4 Mon Sep 17 00:00:00 2001 From: "Trawinski, Dariusz" Date: Tue, 3 Mar 2026 00:30:27 +0100 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/clients_genai.md | 2 +- docs/llm/reference.md | 2 +- docs/mediapipe.md | 2 +- docs/models_repository_graph.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/clients_genai.md b/docs/clients_genai.md index db990c4a6e..eeaa2aff74 100644 --- a/docs/clients_genai.md +++ b/docs/clients_genai.md @@ -16,7 +16,7 @@ Speech to text API Text to speech API ``` ## Introduction -Besides Tensorflow Serving API (`/v1`) and KServe API (`/v2`) frontends, the model server supports a range of endpoints for generative use cases (`v3`). They are extendible using MediaPipe graphs. +Besides TensorFlow Serving API (`/v1`) and KServe API (`/v2`) frontends, the model server supports a range of endpoints for generative use cases (`v3`). They are extendible using MediaPipe graphs. Currently supported endpoints are: OpenAI compatible endpoints: diff --git a/docs/llm/reference.md b/docs/llm/reference.md index 91c5375e1a..654c9b6d90 100644 --- a/docs/llm/reference.md +++ b/docs/llm/reference.md @@ -22,7 +22,7 @@ The servable types are: - Visual Language Model Stateful. First part - Language Model / Visual Language Model - determines whether servable accepts only text or both text and images on the input. -Second part - Continuous Batching / Stateful - determines what kind of GenAI pipeline is used as the engine. By default CPU and GPU devices work on Continuous Batching pipelines. NPU device works only on Stateful servable type. +Second part - Continuous Batching / Stateful - determines what kind of GenAI pipeline is used as the engine. By default CPU and GPU devices work on Continuous Batching pipelines. NPU device works only with the Stateful servable type. User does not have to explicitly select servable type. It is inferred based on model directory contents and selected target device. Model directory contents determine if model can work only with text or visual input as well. As for target device, setting it to `NPU` will always pick Stateful servable, while any other device will result in deploying Continuous Batching servable. diff --git a/docs/mediapipe.md b/docs/mediapipe.md index bae2dee1d3..73f0eb1f15 100644 --- a/docs/mediapipe.md +++ b/docs/mediapipe.md @@ -73,7 +73,7 @@ Following table lists supported tag and packet types in pbtxt graph definition: |input_stream: "REQUEST:req"|input|REQUEST|KServe inference::ModelInferRequest|req| |output_stream: "RESPONSE:res"|output|RESPONSE|KServe inference::ModelInferResponse|res| -In case of missing tag OpenVINO Model Server assumes that the packet type is `ov::Tensor`. The stream name can be arbitrary but the convention is to use a lower case word. +In case of missing tag OpenVINO Model Server assumes that the packet type is `ov::Tensor`. The stream name can be arbitrary but the convention is to use a lowercase word. The required data layout for the MediaPipe `IMAGE` conversion is HWC and the supported precisions are: |Datatype|Allowed number of channels| diff --git a/docs/models_repository_graph.md b/docs/models_repository_graph.md index 04d8a629ce..a0701ece16 100644 --- a/docs/models_repository_graph.md +++ b/docs/models_repository_graph.md @@ -1,7 +1,7 @@ # Graphs Repository {#ovms_docs_models_repository_graph} Model server can deploy a pipelines of models and nodes for any complex and custom transformations. -From the client perspective it behaves almost like a single model but it more flexible and configurable. +From the client perspective it behaves almost like a single model, but it is more flexible and configurable. The model repository employing graphs is similar in the structure to [classic models](./models_repository_classic.md). It needs to include the collection of models used in the pipeline. It also requires a MediaPipe graph definition file in .pbtxt format.