Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,12 +98,12 @@ Shared configuration options for the pull, and pull & start mode. In the presenc

## Pull Mode Options for optimum-cli mode

When pulling models outside of OpenVINO organization the optimum-cli api is used inside ovms. You can set two additional parameters for this mode.
When pulling models outside of OpenVINO organization the optimum-cli api is used inside ovms. You can set additional parameters for this mode.
| Option | Value format | Description |
|------------------------------|--------------|---------------------------------------------------------------------------------------------------------------|
| `--extra_quantization_params`| ` ` | Add advanced quantization parameters. Check [optimum-intel](https://github.com/huggingface/optimum-intel) documentation. Example: `--sym --group-size -1 --ratio 1.0 --awq --scale-estimation --dataset wikitext2` |
| `--weight-format` | `string` | Model precision used in optimum-cli export with conversion. Default `int8`. |
| `--extra_quantization_params`| `string` | Add advanced quantization parameters. Check [optimum-intel](https://github.com/huggingface/optimum-intel) documentation. Example: `--sym --group-size -1 --ratio 1.0 --awq --scale-estimation --dataset wikitext2` |
| `--weight-format` | `string` | Model precision used in optimum-cli export with conversion. Default `int8`. |
| `--vocoder` | `string` | The vocoder model to use for text2speech. For example `microsoft/speecht5_hifigan`. |

There are also additional environment variables that may change the behavior of pulling:

Expand Down Expand Up @@ -161,7 +161,7 @@ Task specific parameters for different tasks (text generation/image generation/e
| `--num_streams` | `integer` | The number of parallel execution streams to use for the model. Use at least 2 on 2 socket CPU systems. Default: 1. |
| `--normalize` | `bool` | Normalize the embeddings. Default: true. |
| `--truncate` | `bool` | Truncate input when it exceeds model context length. Default: false |
| `--mean_pooling` | `bool` | Mean pooling option. Default: false. |
| `--pooling` | `string` | Pooling option. One of: CLS, LAST, MEAN. Default: CLS. |

### Rerank
| option | Value format | Description |
Expand Down
4 changes: 0 additions & 4 deletions src/cli_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -279,10 +279,6 @@ std::variant<bool, std::pair<int, std::string>> CLIParser::parse(int argc, char*
"Resets model precision.",
cxxopts::value<std::string>(),
"PRECISION")
("resize",
"Resets model resize dimensions.",
cxxopts::value<std::string>(),
"resize")
("model_version_policy",
"Model version policy",
cxxopts::value<std::string>(),
Expand Down
4 changes: 2 additions & 2 deletions src/graph_export/embeddings_graph_cli_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ void EmbeddingsGraphCLIParser::createOptions() {
cxxopts::value<std::string>()->default_value("false"),
"truncate")
("pooling",
"Mean pooling option.",
"Pooling option. One of: CLS, LAST, MEAN.",
cxxopts::value<std::string>()->default_value("CLS"),
"POOLING");
}
Expand Down Expand Up @@ -98,7 +98,7 @@ void EmbeddingsGraphCLIParser::prepare(OvmsServerMode serverMode, HFSettingsImpl
embeddingsGraphSettings.truncate = result->operator[]("truncate").as<std::string>();
embeddingsGraphSettings.pooling = result->operator[]("pooling").as<std::string>();
}
if (!(embeddingsGraphSettings.pooling == "CLS" || embeddingsGraphSettings.pooling == "LAST")){
if (!(embeddingsGraphSettings.pooling == "CLS" || embeddingsGraphSettings.pooling == "LAST" || embeddingsGraphSettings.pooling == "MEAN")){
throw std::invalid_argument("Only CLS and LAST pooling modes are supported");
}
hfSettings.graphSettings = std::move(embeddingsGraphSettings);
Expand Down