openvinotoolkit · dtrawins · Mar 3, 2026 · Feb 27, 2026 · Feb 27, 2026 · Mar 2, 2026
diff --git a/docs/advanced_topics.md b/docs/advanced_topics.md
@@ -13,7 +13,7 @@ ovms_extras_nginx-mtls-auth-readme
 ```
 
 ## CPU Extensions
-Implement any CPU layer, that is not support by OpenVINO yet, as a shared library.
+Implement any CPU layer, that is not supported by OpenVINO yet, as a shared library.
 
 [Learn more](../src/example/SampleCpuExtension/README.md)
 

diff --git a/docs/clients_genai.md b/docs/clients_genai.md
@@ -16,7 +16,7 @@ Speech to text API <ovms_docs_rest_api_s2t>
 Text to speech API <ovms_docs_rest_api_t2s>
 ```
 ## Introduction
-Beside Tensorflow Serving API (`/v1`) and KServe API (`/v2`) frontends, the model server supports a range of endpoints for generative use cases (`v3`). They are extendible using MediaPipe graphs.
+Besides TensorFlow Serving API (`/v1`) and KServe API (`/v2`) frontends, the model server supports a range of endpoints for generative use cases (`v3`). They are extendible using MediaPipe graphs.
 Currently supported endpoints are:
 
 OpenAI compatible endpoints:

diff --git a/docs/deploying_server_kubernetes.md b/docs/deploying_server_kubernetes.md
@@ -61,7 +61,7 @@ Note that using s3 or minio bucket requires configuring credentials like describ
 
 ## Deprecation notice about OpenVINO operator
 
-The dedicated [operator for OpenVINO]((https://operatorhub.io/operator/ovms-operator)) is now deprecated. KServe operator can now support all OVMS use cases including generative models. It provides wider set of features and configuration options. Because KServe is commonly used for other serving runtimes, it gives easier transition and transparent migration.
+The dedicated [operator for OpenVINO](https://operatorhub.io/operator/ovms-operator) is now deprecated. KServe operator can now support all OVMS use cases including generative models. It provides wider set of features and configuration options. Because KServe is commonly used for other serving runtimes, it gives easier transition and transparent migration.
 
 ## Additional Resources
 

diff --git a/docs/legacy.md b/docs/legacy.md
@@ -10,12 +10,12 @@ ovms_docs_dag
 ```
 
 ## Stateful models
-Implement any CPU layer, that is not support by OpenVINO yet, as a shared library.
+Implement any CPU layer, that is not supported by OpenVINO yet, as a shared library.
 [Learn more](./stateful_models.md)
 **Note:** The use cases from this feature can be addressed in MediaPipe graphs including generative use cases.
 
 ## DAG pipelines
 The Directed Acyclic Graph (DAG) Scheduler for creating pipeline of models for execution in a single client request.
-[Learn model](./dag_scheduler.md)
+[Learn more](./dag_scheduler.md)
 **Note:** MediaPipe graphs can be a more flexible of pipelines scheduler which can employ various data formats and accelerators.
 
diff --git a/docs/llm/reference.md b/docs/llm/reference.md
@@ -2,7 +2,7 @@
 
 ## Overview
 
-With rapid development of generative AI, new techniques and algorithms for performance optimization and better resource utilization are introduced to make best use of the hardware and provide best generation performance. OpenVINO implements those state of the art methods in it's [GenAI Library](https://github.com/openvinotoolkit/openvino.genai) like:
+With rapid development of generative AI, new techniques and algorithms for performance optimization and better resource utilization are introduced to make best use of the hardware and provide best generation performance. OpenVINO implements those state of the art methods in its [GenAI Library](https://github.com/openvinotoolkit/openvino.genai) like:
   - Continuous Batching
   - Paged Attention
   - Dynamic Split Fuse
@@ -22,7 +22,7 @@ The servable types are:
 - Visual Language Model Stateful.
 
 First part - Language Model / Visual Language Model - determines whether servable accepts only text or both text and images on the input.
-Seconds part - Continuous Batching / Stateful - determines what kind of GenAI pipeline is used as the engine. By default CPU and GPU devices work on Continuous Batching pipelines. NPU device works only on Stateful servable type.
+Second part - Continuous Batching / Stateful - determines what kind of GenAI pipeline is used as the engine. By default CPU and GPU devices work on Continuous Batching pipelines. NPU device works only with the Stateful servable type.
 
 User does not have to explicitly select servable type. It is inferred based on model directory contents and selected target device.
 Model directory contents determine if model can work only with text or visual input as well. As for target device, setting it to `NPU` will always pick Stateful servable, while any other device will result in deploying Continuous Batching servable. 
@@ -354,7 +354,7 @@ Check [tested models](https://github.com/openvinotoolkit/openvino.genai/blob/mas
 
 ### Completions
 
-When sending a request to `/completions` endpoint, model server adds `bos_token_id` during tokenization, so **there is not need to add `bos_token` to the prompt**.
+When sending a request to `/completions` endpoint, model server adds `bos_token_id` during tokenization, so **there is no need to add `bos_token` to the prompt**.
 
 ### Chat Completions
 

diff --git a/docs/mediapipe.md b/docs/mediapipe.md
@@ -65,15 +65,15 @@ Following table lists supported tag and packet types in pbtxt graph definition:
 |pbtxt line|input/output|tag|packet type|stream name|
 |:---|:---|:---|:---|:---|
 |input_stream: "a"|input|none|ov::Tensor|a|
-|output_stream: "b"|input|none|ov::Tensor|b|
+|output_stream: "b"|output|none|ov::Tensor|b|
 |input_stream: "IMAGE:a"|input|IMAGE|mediapipe::ImageFrame|a|
 |output_stream: "IMAGE:b"|output|IMAGE|mediapipe::ImageFrame|b|
-|input_stream: "OVTENSOR:a"|output|OVTENSOR|ov::Tensor|a|
+|input_stream: "OVTENSOR:a"|input|OVTENSOR|ov::Tensor|a|
 |output_stream: "OVTENSOR:b"|output|OVTENSOR|ov::Tensor|b|
 |input_stream: "REQUEST:req"|input|REQUEST|KServe inference::ModelInferRequest|req|
 |output_stream: "RESPONSE:res"|output|RESPONSE|KServe inference::ModelInferResponse|res|
 
-In case of missing tag OpenVINO Model Server assumes that the packet type is `ov::Tensor'. The stream name can be arbitrary but the convention is to use a lower case word.
+In case of missing tag OpenVINO Model Server assumes that the packet type is `ov::Tensor`. The stream name can be arbitrary but the convention is to use a lowercase word.
 
 The required data layout for the MediaPipe `IMAGE` conversion is HWC and the supported precisions are:
 |Datatype|Allowed number of channels|
@@ -110,7 +110,7 @@ client.async_stream_infer(
 ```
 
 ### List of default calculators
-Beside OpenVINO inference calculators, model server public docker image also includes all the calculators used in the enabled demos.
+Besides OpenVINO inference calculators, model server public docker image also includes all the calculators used in the enabled demos.
 The list of all included calculators, subgraphs, input/output stream handler is reported in the model server is started with extra parameter `--log_level TRACE`.
 
 ### CPU and GPU execution

diff --git a/docs/models_repository_graph.md b/docs/models_repository_graph.md
@@ -1,10 +1,10 @@
 # Graphs Repository {#ovms_docs_models_repository_graph}
 
 Model server can deploy a pipelines of models and nodes for any complex and custom transformations.
-From the client perspective of behaves almost like a single model but it more flexible and configurable.
+From the client perspective it behaves almost like a single model, but it is more flexible and configurable.
 
 The model repository employing graphs is similar in the structure to [classic models](./models_repository_classic.md).
-It needs to include the collection of models used in the pipeline. It also require a MediaPipe graph definition file in .pbtxt format.
+It needs to include the collection of models used in the pipeline. It also requires a MediaPipe graph definition file in .pbtxt format.
 
 ```
 graph_models
@@ -21,7 +21,7 @@ graph_models
 └── config.json
 ```
 
-In can the graph includes python nodes, there should be included also a python file with the node implementation.
+In case the graph includes python nodes, there should be included also a python file with the node implementation.
 
 
 For more information on how to use MediaPipe graphs, refer to the [article](./mediapipe.md).

diff --git a/docs/performance_tuning.md b/docs/performance_tuning.md
@@ -146,7 +146,7 @@ $ cpupower frequency-set --min 3.1GHz
 
 ## Network Configuration for Optimal Performance
 
-By default, OVMS endpoints are bound to all ipv4 addresses. On same systems, which route localhost name to ipv6 address, it might cause extra time on the client side to switch to ipv4. It can effectively results with extra 1-2s latency.
+By default, OVMS endpoints are bound to all ipv4 addresses. On same systems, which route localhost name to ipv6 address, it might cause extra time on the client side to switch to ipv4. It can effectively result in extra 1-2s latency.
 It can be overcome by switching the API URL to `http://127.0.0.1` on the client side.
 
 To optimize network connection performance:

diff --git a/docs/security_considerations.md b/docs/security_considerations.md
@@ -33,7 +33,7 @@ OVMS supports multimodal models with image inputs provided as URL. However, to p
 OpenVINO Model Server has a set of mechanisms preventing denial of service attacks from the client applications. They include the following:
 - setting the number of inference execution streams which can limit the number of parallel inference calls in progress for each model. It can be tuned with `NUM_STREAMS` or `PERFORMANCE_HINT` plugin config.
 - setting the maximum number of gRPC threads which is, by default, configured to the number 8 * number_of_cores. It can be changed with the parameter `--grpc_max_threads`.
-- setting the maximum number of REST workers which is, be default, configured to the number 4 * number_of_cores. It can be changed with the parameter `--rest_workers`.
+- setting the maximum number of REST workers which is, by default, configured to the number 4 * number_of_cores. It can be changed with the parameter `--rest_workers`.
 - maximum size of REST and GRPC message which is 1GB - bigger messages will be rejected
 - setting max_concurrent_streams which defines how many concurrent threads can be initiated from a single client - the remaining will be queued. The default is equal to the number of CPU cores. It can be changed with the `--grpc_channel_arguments grpc.max_concurrent_streams=8`.
 - setting the gRPC memory quota for the requests buffer - the default is 2GB. It can be changed with `--grpc_memory_quota=2147483648`. Value `0` invalidates the quota.

diff --git a/docs/speech_recognition/reference.md b/docs/speech_recognition/reference.md
@@ -59,7 +59,7 @@ The calculator supports the following `node_options` for tuning the pipeline con
 We recommend using [export script](../../demos/common/export_models/README.md) to prepare models directory structure for serving.
 Check [supported models](https://openvinotoolkit.github.io/openvino.genai/docs/supported-models/#speech-recognition-models).
 
-### Text to speech calculator limitations
+### Speech to text calculator limitations
 - Streaming is not supported
 
 ## References

diff --git a/docs/starting_server.md b/docs/starting_server.md
@@ -1,6 +1,6 @@
 # Starting the Server  {#ovms_docs_serving_model}
 
-There are two method for passing to the model server information about the models and their configuration:
+There are two methods for passing to the model server information about the models and their configuration:
 - via CLI parameters - for a single model or pipeline
 - via config file in json format - for any number of models and pipelines