diff --git a/docs/advanced_topics.md b/docs/advanced_topics.md index 664d924caa..513d4277ed 100644 --- a/docs/advanced_topics.md +++ b/docs/advanced_topics.md @@ -13,7 +13,7 @@ ovms_extras_nginx-mtls-auth-readme ``` ## CPU Extensions -Implement any CPU layer, that is not support by OpenVINO yet, as a shared library. +Implement any CPU layer, that is not supported by OpenVINO yet, as a shared library. [Learn more](../src/example/SampleCpuExtension/README.md) diff --git a/docs/clients_genai.md b/docs/clients_genai.md index 2d0799cfed..eeaa2aff74 100644 --- a/docs/clients_genai.md +++ b/docs/clients_genai.md @@ -16,7 +16,7 @@ Speech to text API Text to speech API ``` ## Introduction -Beside Tensorflow Serving API (`/v1`) and KServe API (`/v2`) frontends, the model server supports a range of endpoints for generative use cases (`v3`). They are extendible using MediaPipe graphs. +Besides TensorFlow Serving API (`/v1`) and KServe API (`/v2`) frontends, the model server supports a range of endpoints for generative use cases (`v3`). They are extendible using MediaPipe graphs. Currently supported endpoints are: OpenAI compatible endpoints: diff --git a/docs/deploying_server_kubernetes.md b/docs/deploying_server_kubernetes.md index 42417d2f16..264627bffc 100644 --- a/docs/deploying_server_kubernetes.md +++ b/docs/deploying_server_kubernetes.md @@ -61,7 +61,7 @@ Note that using s3 or minio bucket requires configuring credentials like describ ## Deprecation notice about OpenVINO operator -The dedicated [operator for OpenVINO]((https://operatorhub.io/operator/ovms-operator)) is now deprecated. KServe operator can now support all OVMS use cases including generative models. It provides wider set of features and configuration options. Because KServe is commonly used for other serving runtimes, it gives easier transition and transparent migration. +The dedicated [operator for OpenVINO](https://operatorhub.io/operator/ovms-operator) is now deprecated. KServe operator can now support all OVMS use cases including generative models. It provides wider set of features and configuration options. Because KServe is commonly used for other serving runtimes, it gives easier transition and transparent migration. ## Additional Resources diff --git a/docs/legacy.md b/docs/legacy.md index b6504c08bf..7ed1d9474b 100644 --- a/docs/legacy.md +++ b/docs/legacy.md @@ -10,12 +10,12 @@ ovms_docs_dag ``` ## Stateful models -Implement any CPU layer, that is not support by OpenVINO yet, as a shared library. +Implement any CPU layer, that is not supported by OpenVINO yet, as a shared library. [Learn more](./stateful_models.md) **Note:** The use cases from this feature can be addressed in MediaPipe graphs including generative use cases. ## DAG pipelines The Directed Acyclic Graph (DAG) Scheduler for creating pipeline of models for execution in a single client request. -[Learn model](./dag_scheduler.md) +[Learn more](./dag_scheduler.md) **Note:** MediaPipe graphs can be a more flexible of pipelines scheduler which can employ various data formats and accelerators. diff --git a/docs/llm/reference.md b/docs/llm/reference.md index e66e41a1a8..654c9b6d90 100644 --- a/docs/llm/reference.md +++ b/docs/llm/reference.md @@ -2,7 +2,7 @@ ## Overview -With rapid development of generative AI, new techniques and algorithms for performance optimization and better resource utilization are introduced to make best use of the hardware and provide best generation performance. OpenVINO implements those state of the art methods in it's [GenAI Library](https://github.com/openvinotoolkit/openvino.genai) like: +With rapid development of generative AI, new techniques and algorithms for performance optimization and better resource utilization are introduced to make best use of the hardware and provide best generation performance. OpenVINO implements those state of the art methods in its [GenAI Library](https://github.com/openvinotoolkit/openvino.genai) like: - Continuous Batching - Paged Attention - Dynamic Split Fuse @@ -22,7 +22,7 @@ The servable types are: - Visual Language Model Stateful. First part - Language Model / Visual Language Model - determines whether servable accepts only text or both text and images on the input. -Seconds part - Continuous Batching / Stateful - determines what kind of GenAI pipeline is used as the engine. By default CPU and GPU devices work on Continuous Batching pipelines. NPU device works only on Stateful servable type. +Second part - Continuous Batching / Stateful - determines what kind of GenAI pipeline is used as the engine. By default CPU and GPU devices work on Continuous Batching pipelines. NPU device works only with the Stateful servable type. User does not have to explicitly select servable type. It is inferred based on model directory contents and selected target device. Model directory contents determine if model can work only with text or visual input as well. As for target device, setting it to `NPU` will always pick Stateful servable, while any other device will result in deploying Continuous Batching servable. @@ -354,7 +354,7 @@ Check [tested models](https://github.com/openvinotoolkit/openvino.genai/blob/mas ### Completions -When sending a request to `/completions` endpoint, model server adds `bos_token_id` during tokenization, so **there is not need to add `bos_token` to the prompt**. +When sending a request to `/completions` endpoint, model server adds `bos_token_id` during tokenization, so **there is no need to add `bos_token` to the prompt**. ### Chat Completions diff --git a/docs/mediapipe.md b/docs/mediapipe.md index 84e070e3d9..73f0eb1f15 100644 --- a/docs/mediapipe.md +++ b/docs/mediapipe.md @@ -65,15 +65,15 @@ Following table lists supported tag and packet types in pbtxt graph definition: |pbtxt line|input/output|tag|packet type|stream name| |:---|:---|:---|:---|:---| |input_stream: "a"|input|none|ov::Tensor|a| -|output_stream: "b"|input|none|ov::Tensor|b| +|output_stream: "b"|output|none|ov::Tensor|b| |input_stream: "IMAGE:a"|input|IMAGE|mediapipe::ImageFrame|a| |output_stream: "IMAGE:b"|output|IMAGE|mediapipe::ImageFrame|b| -|input_stream: "OVTENSOR:a"|output|OVTENSOR|ov::Tensor|a| +|input_stream: "OVTENSOR:a"|input|OVTENSOR|ov::Tensor|a| |output_stream: "OVTENSOR:b"|output|OVTENSOR|ov::Tensor|b| |input_stream: "REQUEST:req"|input|REQUEST|KServe inference::ModelInferRequest|req| |output_stream: "RESPONSE:res"|output|RESPONSE|KServe inference::ModelInferResponse|res| -In case of missing tag OpenVINO Model Server assumes that the packet type is `ov::Tensor'. The stream name can be arbitrary but the convention is to use a lower case word. +In case of missing tag OpenVINO Model Server assumes that the packet type is `ov::Tensor`. The stream name can be arbitrary but the convention is to use a lowercase word. The required data layout for the MediaPipe `IMAGE` conversion is HWC and the supported precisions are: |Datatype|Allowed number of channels| @@ -110,7 +110,7 @@ client.async_stream_infer( ``` ### List of default calculators -Beside OpenVINO inference calculators, model server public docker image also includes all the calculators used in the enabled demos. +Besides OpenVINO inference calculators, model server public docker image also includes all the calculators used in the enabled demos. The list of all included calculators, subgraphs, input/output stream handler is reported in the model server is started with extra parameter `--log_level TRACE`. ### CPU and GPU execution diff --git a/docs/models_repository_graph.md b/docs/models_repository_graph.md index 39bae5be37..a0701ece16 100644 --- a/docs/models_repository_graph.md +++ b/docs/models_repository_graph.md @@ -1,10 +1,10 @@ # Graphs Repository {#ovms_docs_models_repository_graph} Model server can deploy a pipelines of models and nodes for any complex and custom transformations. -From the client perspective of behaves almost like a single model but it more flexible and configurable. +From the client perspective it behaves almost like a single model, but it is more flexible and configurable. The model repository employing graphs is similar in the structure to [classic models](./models_repository_classic.md). -It needs to include the collection of models used in the pipeline. It also require a MediaPipe graph definition file in .pbtxt format. +It needs to include the collection of models used in the pipeline. It also requires a MediaPipe graph definition file in .pbtxt format. ``` graph_models @@ -21,7 +21,7 @@ graph_models └── config.json ``` -In can the graph includes python nodes, there should be included also a python file with the node implementation. +In case the graph includes python nodes, there should be included also a python file with the node implementation. For more information on how to use MediaPipe graphs, refer to the [article](./mediapipe.md). diff --git a/docs/performance_tuning.md b/docs/performance_tuning.md index 7ae7f38b93..ddf0de20d4 100644 --- a/docs/performance_tuning.md +++ b/docs/performance_tuning.md @@ -146,7 +146,7 @@ $ cpupower frequency-set --min 3.1GHz ## Network Configuration for Optimal Performance -By default, OVMS endpoints are bound to all ipv4 addresses. On same systems, which route localhost name to ipv6 address, it might cause extra time on the client side to switch to ipv4. It can effectively results with extra 1-2s latency. +By default, OVMS endpoints are bound to all ipv4 addresses. On same systems, which route localhost name to ipv6 address, it might cause extra time on the client side to switch to ipv4. It can effectively result in extra 1-2s latency. It can be overcome by switching the API URL to `http://127.0.0.1` on the client side. To optimize network connection performance: diff --git a/docs/security_considerations.md b/docs/security_considerations.md index 43693b0a56..56f33c2a87 100644 --- a/docs/security_considerations.md +++ b/docs/security_considerations.md @@ -33,7 +33,7 @@ OVMS supports multimodal models with image inputs provided as URL. However, to p OpenVINO Model Server has a set of mechanisms preventing denial of service attacks from the client applications. They include the following: - setting the number of inference execution streams which can limit the number of parallel inference calls in progress for each model. It can be tuned with `NUM_STREAMS` or `PERFORMANCE_HINT` plugin config. - setting the maximum number of gRPC threads which is, by default, configured to the number 8 * number_of_cores. It can be changed with the parameter `--grpc_max_threads`. -- setting the maximum number of REST workers which is, be default, configured to the number 4 * number_of_cores. It can be changed with the parameter `--rest_workers`. +- setting the maximum number of REST workers which is, by default, configured to the number 4 * number_of_cores. It can be changed with the parameter `--rest_workers`. - maximum size of REST and GRPC message which is 1GB - bigger messages will be rejected - setting max_concurrent_streams which defines how many concurrent threads can be initiated from a single client - the remaining will be queued. The default is equal to the number of CPU cores. It can be changed with the `--grpc_channel_arguments grpc.max_concurrent_streams=8`. - setting the gRPC memory quota for the requests buffer - the default is 2GB. It can be changed with `--grpc_memory_quota=2147483648`. Value `0` invalidates the quota. diff --git a/docs/speech_recognition/reference.md b/docs/speech_recognition/reference.md index 2bfe69476f..7e7465ebe1 100644 --- a/docs/speech_recognition/reference.md +++ b/docs/speech_recognition/reference.md @@ -59,7 +59,7 @@ The calculator supports the following `node_options` for tuning the pipeline con We recommend using [export script](../../demos/common/export_models/README.md) to prepare models directory structure for serving. Check [supported models](https://openvinotoolkit.github.io/openvino.genai/docs/supported-models/#speech-recognition-models). -### Text to speech calculator limitations +### Speech to text calculator limitations - Streaming is not supported ## References diff --git a/docs/starting_server.md b/docs/starting_server.md index 0709fbb8c1..f9d35a96e6 100644 --- a/docs/starting_server.md +++ b/docs/starting_server.md @@ -1,6 +1,6 @@ # Starting the Server {#ovms_docs_serving_model} -There are two method for passing to the model server information about the models and their configuration: +There are two methods for passing to the model server information about the models and their configuration: - via CLI parameters - for a single model or pipeline - via config file in json format - for any number of models and pipelines