chore(deps): update dependency sentence-transformers to v5 - autoclosed by renovate[bot] · Pull Request #143 · ygalblum/knowledge-base-gpt

renovate · 2025-07-01T20:31:17Z

Coming soon: The Renovate bot (GitHub App) will be renamed to Mend. PRs from Renovate will soon appear from 'Mend'. Learn more here.

This PR contains the following updates:

Package	Change	Age	Confidence
sentence-transformers	`==4.1.0` -> `==5.1.1`

Release Notes

UKPLab/sentence-transformers (sentence-transformers)

`v5.1.1`: - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative

Compare Source

This patch makes Sentence Transformers more explicit with incorrect arguments and introduces some fixes for multi-GPU processing, evaluators, and hard negatives mining.

Install this version with

### Training + Inference
pip install sentence-transformers[train]==5.1.1

### Inference only, use one of:
pip install sentence-transformers==5.1.1
pip install sentence-transformers[onnx-gpu]==5.1.1
pip install sentence-transformers[onnx]==5.1.1
pip install sentence-transformers[openvino]==5.1.1

Error if unused kwargs is passed & `get_model_kwargs` (#3500)

Some SentenceTransformer or SparseEncoder models support custom model-specific keyword arguments, such as jinaai/jina-embeddings-v4. As of this release, calling model.encode with keyword arguments that aren't used by the model will result in an error.

>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("all-MiniLM-L6-v2")
>>> model.encode("Who is Amelia Earhart?", normalize=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[sic]/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "[sic]/SentenceTransformer.py", line 983, in encode
    raise ValueError(
ValueError: SentenceTransformer.encode() has been called with additional keyword arguments that this model does not use: ['normalize']. As per SentenceTransformer.get_model_kwargs(), this model does not accept any additional keyword arguments.

Quite useful when you, for example, accidentally forget that the parameter to get normalized embeddings is normalize_embeddings. Prior to this version, this parameter would simply quietly be ignored.

To check which custom extra keyword arguments may be used for your model, you can call the new get_model_kwargs method:

>>> from sentence_transformers import SentenceTransformer, SparseEncoder
>>> SentenceTransformer("all-MiniLM-L6-v2").get_model_kwargs()
[]
>>> SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True).get_model_kwargs()
['task', 'truncate_dim']
>>> SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill").get_model_kwargs()
['task']

Note: You can always pass the task parameter, it's the only model-specific parameter that will be quietly ignored. This means that you can always use model.encode(..., task="query") and model.encode(..., task="document").

Minor Features

Add FLOPS calculations to SparseEncoder evaluators (#3456)
Add Support for Knowledgeable Passage Retriever (KPR) models (#3495)

Minor Fixes

Fix batch_size being ignored in CrossEncoderRerankingEvaluator (#3497)
Fix multi-GPU processing with encode, embeddings are now moved from the various devices to the CPU before being stacked into one tensor (#3488)
Use encode_query and encode_document in mine_hard_negatives, automatically using defined "query" and "document" prompts (#3502)
Fix "Path does not exist" errors when calling an Evaluator with a output_path that doesn't exist yet (#3516)
Fix the number of reported number of missing negatives in mine_hard_negatives (#3504)

All Changes

Docs Patch for AnglE and CoSENT Losses by @johneckberg in #3496
[fix] add batch size parameter to model prediction in CrossEncoderRerankingEvaluator by @emapco in #3497
Add FLOPS calculation and update metrics in SparseEvaluators by @arthurbr11 in #3456
[fix] Ensure multi-process embeddings are moved to CPU for concatenation by @tomaarsen in #3488
[model_card] Don't override manually provided languages in model card by @tomaarsen in #3501
[tests] Add hard negatives test showing multiple positives are correctly handled by @tomaarsen in #3503
[feat] Use encode_document and encode_query in mine_hard_negatives by @tomaarsen in #3502
Add Support for Knowledgeable Passage Retriever (KPR) by @ikuyamada in #3495
Update rasyosef/splade-mini MSMARCO and BEIR-13 benchmark scores in pretrained_models.md by @rasyosef in #3508
always pass input_ids, attention_mask, token_type_ids, inputs_embeds to forward by @Samoed in #3509
[feat] add get_model_kwargs method; throw error if unused kwarg is passed by @tomaarsen in #3500
Fix:Import SentenceTransformer class explicitly in losses module by @altescy in #3521
fix: add makedirs to informationretrievalevaluator by @stephantul in #3516
[fix] Fix the number of missing negatives in mine_hard_negatives by @tomaarsen in #3504

New Contributors

@ikuyamada made their first contribution in #3495
@rasyosef made their first contribution in #3508

Full Changelog: huggingface/sentence-transformers@v5.1.0...v5.1.1

`v5.1.0`: - ONNX and OpenVINO backends offering 2-3x speedups; more hard negatives mining formats

Compare Source

This release introduces 2 new efficient computing backends for SparseEncoder embedding models: ONNX and OpenVINO + optimization & quantization, allowing for speedups up to 2x-3x; a new "n-tuple-score" output format for hard negative mining for distillation; gathering across devices for free lunch on multi-gpu training; trackio support; MTEB documentation; any many small fixes and features.

Install this version with

### Training + Inference
pip install sentence-transformers[train]==5.1.0

### Inference only, use one of:
pip install sentence-transformers==5.1.0
pip install sentence-transformers[onnx-gpu]==5.1.0
pip install sentence-transformers[onnx]==5.1.0
pip install sentence-transformers[openvino]==5.1.0

Faster ONNX and OpenVINO backends for SparseEncoder models (#3475)

Introducing a new backend keyword argument to the SparseEncoder initialization, allowing values of "torch" (default), "onnx", and "openvino".
These require installing sentence-transformers with specific extras:

pip install sentence-transformers[onnx-gpu]

### or ONNX for CPU only:
pip install sentence-transformers[onnx]

### or
pip install sentence-transformers[openvino]

It's as simple as:

from sentence_transformers import SparseEncoder

### Load a SparseEncoder model with the ONNX backend
model = SparseEncoder("naver/splade-v3", backend="onnx")

query = "Which planet is known as the Red Planet?"
documents = [
   "Venus is often called Earth's twin because of its similar size and proximity.",
   "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
   "Jupiter, the largest planet in our solar system, has a prominent red spot.",
   "Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]

query_embeddings = model.encode_query(query)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)

### torch.Size([30522]) torch.Size([4, 30522])

similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)

### tensor([[12.1450, 26.1040, 22.0025, 23.3877]])

decoded_query = model.decode(query_embeddings, top_k=5)
decoded_documents = model.decode(document_embeddings, top_k=5)
print(decoded_query)

### [('red', 3.0222), ('planet', 2.5001), ('planets', 1.9412), ('known', 1.8126), ('nasa', 0.9347)]
print(decoded_documents)

### [
###     [('venus', 3.1980), ('twin', 2.7036), ('earth', 2.4310), ('twins', 2.0957), ('planet', 1.9462)],

###     [('mars', 3.1443), ('planet', 2.4924), ('red', 2.4514), ('reddish', 2.2234), ('planets', 2.1976)],
###     [('jupiter', 2.9604), ('red', 2.5507), ('planet', 2.3774), ('planets', 2.1641), ('spot', 2.1138)],

###     [('saturn', 2.9354), ('red', 2.4548), ('planet', 2.3962), ('mistaken', 2.3361), ('cass', 2.2100)]
### ]

If you specify a backend and your model repository or directory contains an ONNX/OpenVINO model file, it will automatically be used! And if your model repository or directory doesn't have one already, an ONNX/OpenVINO model will be automatically exported. Just remember to model.push_to_hub or model.save_pretrained into the same model repository or directory to avoid having to re-export the model every time.

All keyword arguments passed via model_kwargs will be passed on to ORTModelForMaskedLM.from_pretrained or ORTModelForMaskedLM.from_pretrained. The most useful arguments are:

provider: (Only if backend="onnx") ONNX Runtime provider to use for loading the model, e.g. "CPUExecutionProvider" . See https://onnxruntime.ai/docs/execution-providers/ for possible providers. If not specified, the strongest provider (E.g. "CUDAExecutionProvider") will be used.
file_name: The name of the ONNX file to load. If not specified, will default to "model.onnx" or otherwise "onnx/model.onnx" for ONNX, and "openvino_model.xml" and "openvino/openvino_model.xml" for OpenVINO. This argument is useful for specifying optimized or quantized models.
export: A boolean flag specifying whether the model will be exported. If not provided, export will be set to True if the model repository or directory does not already contain an ONNX or OpenVINO model.

Benchmarks

We ran benchmarks for CPU and GPU, averaging findings across 3 datasets, and numerous batch sizes. Here are the findings:

These findings resulted in these recommendations:

For GPU, you can expect 1.81x speedup with bf16 at no cost, and for CPU you can expect up to ~3x speedup at minimal cost of accuracy in our evaluation. Your mileage with the accuracy hit for quantization may vary, but it seems to remain very small.

Read the Speeding up Inference documentation for more details.

New `n-tuple-scores` output format from `mine_hard_negatives` (#3430, #3481)

The mine_hard_negatives utility function has been extended to support the n-tuple-scores output format, which outputs negatives into num_negatives + 3 columns:

'query', 'answer', 'negative_1', 'negative_2', ..., 'score'

where the 'score' is a list of scores for the query-answer plus each query-negative pair.

from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset

### Load a Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

### Load a dataset to mine hard negatives from
dataset = load_dataset("sentence-transformers/natural-questions", split="train")

### Mine hard negatives into num_negatives + 3 columns: 'query', 'answer', 'negative_1', 'negative_2', ..., 'score'
### where 'score' is a list of scores for the query-answer plus each query-negative pair.
dataset = mine_hard_negatives(
    dataset=dataset,
    model=model,
    num_negatives=5,
    sampling_strategy="top",
    batch_size=128,
    use_faiss=True,
    output_format="n-tuple-scores",
)
print(dataset)
print(dataset[14])
"""
{
    'query': 'when did jack and the beanstalk take place',
    'answer': "Jack and the Beanstalk According to researchers at the universities in Durham and Lisbon, the story originated more than 5,000 years ago, based on a widespread archaic story form which is now classified by folklorists as ATU 328 The Boy Who Stole Ogre's Treasure.[7]",
    'negative_1': 'Jack and the Beanstalk "Jack and the Beanstalk" is an English fairy tale. It appeared as "The Story of Jack Spriggins and the Enchanted Bean" in 1734[1] and as Benjamin Tabart\'s moralised "The History of Jack and the Bean-Stalk" in 1807.[2] Henry Cole, publishing under pen name Felix Summerly popularised the tale in The Home Treasury (1845),[3] and Joseph Jacobs rewrote it in English Fairy Tales (1890).[4] Jacobs\' version is most commonly reprinted today and it is believed to be closer to the oral versions than Tabart\'s because it lacks the moralising.[5]',
    'negative_2': 'Jack and the Beanstalk Jack climbs the beanstalk twice more. He learns of other treasures and steals them when the giant sleeps: first a goose that lays golden eggs, then a magic harp that plays by itself. The giant wakes when Jack leaves the house with the harp and chases Jack down the beanstalk. Jack calls to his mother for an axe and before the giant reaches the ground, cuts down the beanstalk, causing the giant to fall to his death.',
    'negative_3': 'Jack in the Box Jack in the Box is an American fast-food restaurant chain founded February 21, 1951, by Robert O. Peterson in San Diego, California, where it is headquartered. The chain has 2,200 locations, primarily serving the West Coast of the United States and selected large urban areas in the eastern portion of the US including Texas. Food items include a variety of hamburger and cheeseburger sandwiches along with selections of internationally themed foods such as tacos and egg rolls. The company also operates the Qdoba Mexican Grill chain.[4][5]',
    'negative_4': 'Jack in the Box Jack in the Box is an American fast-food restaurant chain founded February 21, 1951, by Robert O. Peterson in San Diego, California, where it is headquartered. The chain has 2,200 locations, primarily serving the West Coast of the United States and selected large urban areas in the eastern portion of the US including Texas and the Charlotte metropolitan area. The company also formerly operated the Qdoba Mexican Grill chain until Apollo Global Management bought the chain in December 2017.[4]',
    'negative_5': "Jack Box Jack Box (full name Jack I. Box; or simply known as Jack) is the mascot of American restaurant chain Jack in the Box. In the advertisements, he is the founder, CEO, and ad spokesman for the chain. According to the company's web site, he has the appearance of a typical male, with the exception of his huge spherical white head, blue dot eyes, conical black pointed nose, and a curvilinear red smile. He is most of the time seen wearing his yellow clown cap, and a business suit driving a red Viper convertible.",
    'score': [0.7949077486991882, 0.8010389804840088, 0.6466549634933472, 0.5222680568695068, 0.5216285586357117, 0.47328776121139526]
}
"""

This format is directly usable in various distillation losses:

MarginMSELoss for SentenceTransformer models
DistillKLDivLoss for SentenceTransformer models
SparseDistillKLDivLoss for SparseEncoder models
SparseMarginMSELoss for SparseEncoder models
MarginMSELoss for CrossEncoder models

Note that without applying any absolute_margin, relative_margin, max_score, etc., you can mine negatives that actually score better than your positive. With a distillation loss, this is totally fine. It will learn using the (margins between the) scores, so you don't have to worry about false negatives as much as when using e.g. MultipleNegativesRankingLoss.

This release also adds support for 1) n-tuples instead of just triplets and 2) num_negatives + 1 scores where the first score is the query-positive score for MarginMSELoss for CrossEncoder models.

Gathering Across Devices (#3442, #3453)

Various loss functions in Sentence Transformers take advantage of so-called "in-batch negatives". With these losses, for each sample in a batch, all data for the other samples will be considered as negatives, because random inputs are likely unrelated to the sample. This pushes them further apart, resulting ideally only in higher similarity scores for inputs that really are similar.

This release introduces a new gather_across_devices parameter for each of these losses. This parameter only works in a multi-GPU setting, and will pull the other samples from other devices into the computation. In short: if you have the following setup:

loss: CachedMultipleNegativesRankingLoss (a.k.a. InfoNCE with GradCache) with mini_batch_size=16
per_device_train_batch_size=128 in the SentenceTransformersTrainingArguments
8 GPUs
Training with triplets: query, positive, negative

Then each device will have the memory usage corresponding to a batch size of 16, while each sample has 1 positive and 255 negatives (1 hard negative for that sample, 127 other positive values as in-batch negatives and 127 other negative values as in-batch negatives). Your global batch size will be 128 * 8 = 1024, and your learning rate should be set according to that value.

Now, if you use the exact same setup, but with gather_across_devices=True, then your setting is suddenly:
Each device will have the memory usage corresponding to a batch size of 16, while each sample has 1 positive and 2047 negatives (1 hard negative for that sample, 1023 other positive values as in-batch negatives and 1023 other negative values as in-batch negatives). Your global batch size will be 128 * 8 = 1024, and your learning rate should be set according to that value.

The difference is that the in-batch negatives will now pull from other devices too! Because a larger batch size often results in stronger models with in-batch negatives losses, this should give stronger models at almost no overhead.

Here are the results from one of my simple experiments with finetuning mpnet-base on natural-questions with 8 GPUs:
baseline:

Evaluation: 0.5111 NDCG@10
Runtime: 89.4335 seconds

gather_across_devices=True:

Evaluation: 0.5359 NDCG@10
Runtime: 89.3699 seconds

Trackio support (#3467)

If your transformers version is high enough, and you have trackio installed (pip install trackio), then Sentence Transformers will also export logs to Trackio. It'll allow you to browse to localhost to track your experiments for free.

MTEB Documentation (#3477)

If you're interested in evaluating your SentenceTransformer models on common benchmarks, then MTEB is your friend. However, there wasn't yet any documentation to guide you in the right direction. This release, we added some:

Sentence Transformers > Usage > Evaluation with MTEB

Minor Notable Changes

Fix crashes with MarginMSELoss and SparseMarginMSELoss training when using anchor, positive, negative triplets with 1 score (i.e. the difference between negative and positive). (#3421)
Update temperature parameter default value in DistillKLDivLoss to 1.0. The SparseDistillKLDivLoss default temperature stays at 2.0. (#3428)
Avoid unneeded warning when calling encode_query/encode_document with prompt (#3444)
Fix compatibility issues with Datasets v4.0 (#3445, #3455)
Fix Router torch initialization, resulted in issues with DataParallel and memory usage (#3454)
No longer crash if CrossEncoder.predict is called with an empty list (#3466)
More consistent output types when calling with empty list as input (#3466)
Reintroduce FIPS compatibility (#3479)

All Changes

Add redirect for HPO training examples in .htaccess by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3412l/3412
[docs] Fix link in README for training script name by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3417l/3417
[docs] Fix arxiv link in SpladePooling docs by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3418l/3418
Update README.md by @CharlesCNorthttps://github.com/UKPLab/sentence-transformers/pull/3419l/3419
Adjust label shape handling in MarginMSELoss for single score inputs by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3421l/3421
[tests] Reuse models more where possible by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3432l/3432
Update temperature parameter default value in DistillKLDivLoss to 1.0 by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3428l/3428
[model card] Avoid pipe characters that mess up table formatting by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3429l/3429
[feat] Add "n-tuple-scores" output format to mine_hard_negatives function by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3430l/3430
Fix ONNX/OV export; Avoid .transformers_model by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3439l/3439
[feat] Avoid unneeded warning when calling encode_query/document with prompt by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3444l/3444
[compat] Fix compatibility issues with datasets v4 by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3445l/3445
Sync prompts type with documentation by @FremyCompahttps://github.com/UKPLab/sentence-transformers/pull/3427l/3427
[feat] Add gather_across_devices parameter to some contrastive losses by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3442l/3442
[chore] Redistribute util.py (and its tests) to separate directory by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3446l/3446
[tests] Reduce the number of hub requests for the model card tests by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3447l/3447
[fix] cast indexing numpy int to Python int by @emaphttps://github.com/UKPLab/sentence-transformers/pull/3455l/3455
[fix] Fix Router torch initialization, fixes DP by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3454l/3454
[fix] Patch gather_across_devices for in-batch negatives losses by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3453l/3453
Revert changes to multi-GPU evaluator calls by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3463l/3463
Update README.md (grammar mistakes) by @ddofhttps://github.com/UKPLab/sentence-transformers/pull/3458l/3458
[feat] Update the trackio default project if not already defined by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3467l/3467
Fix: prevent loading best model when PEFT adapters are active (#3056) by @sahibpreehttps://github.com/UKPLab/sentence-transformers/pull/3470rs/pull/3470
[docs] Fix dead link in ContrastiveLoss references by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3476l/3476
[docs] Add splade_index semantic search example by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3473l/3473
[feat] Add ONNX, OV support for SparseEncoder; refactor ONNX/OV by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3475l/3475
chore: Handle error when predict is called with an empty sentence list by @nitin-nhttps://github.com/UKPLab/sentence-transformers/pull/3466l/3466
[fix] FIPS compatibility - use SHA256 with usedforsecurity=False in hard negatives caching by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3479l/3479
docs: add MTEB evaluation guide and update usage.rst by @sahibpreetsinghhttps://github.com/UKPLab/sentence-transformers/pull/3477l/3477
[feat] Allow n-tuples for CE MarginMSE training by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3481l/3481
[docs] Update main sbert.net page with v5.1 mention by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3482l/3482

New Contributors

@CharlesCNorton made their first contributihttps://github.com/UKPLab/sentence-transformers/pull/3419l/3419
@FremyCompany made their first contributihttps://github.com/UKPLab/sentence-transformers/pull/3427l/3427
@ddofer made their first contributihttps://github.com/UKPLab/sentence-transformers/pull/3458l/3458
@sahibpreetsingh12 made their first contributihttps://github.com/UKPLab/sentence-transformers/pull/3470l/3470
@nitin-nsp made their first contributihttps://github.com/UKPLab/sentence-transformers/pull/3466l/3466

Also thanks to @Samoed and @KennethEnevoldsen for their reviews on the MTEB documentation, and thanks to @NohTow for the inspiration on gathering across devices.

Full Changelog: huggingface/sentence-transformers@v5.0.0...v5.1.0

`v5.0.0`: - SparseEncoder support; encode_query & encode_document; multi-processing in encode; Router; and more

Compare Source

This release consists of significant updates including the introduction of Sparse Encoder models, new methods encode_query and encode_document, multi-processing support in encode, the Router module for asymmetric models, custom learning rates for parameter groups, composite loss logging, and various small improvements and bug fixes.

Install this version with

### Training + Inference
pip install sentence-transformers[train]==5.0.0

### Inference only, use one of:
pip install sentence-transformers==5.0.0
pip install sentence-transformers[onnx-gpu]==5.0.0
pip install sentence-transformers[onnx]==5.0.0
pip install sentence-transformers[openvino]==5.0.0

[!TIP]
Our Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 blogpost is an excellent place to learn about finetuning sparse embedding models!

[!NOTE]
This release is designed to be fully backwards compatible, meaning that you should be able to upgrade from older versions to v5.x without any issues. If you are running into issues when upgrading, feel free to open an issue. Also see the Migration Guide for changes that we would recommend.

Sparse Encoder models

The Sentence Transformers v5.0 release introduces Sparse Embedding models, also known as Sparse Encoders. These models generate high-dimensional embeddings, often with 30,000+ dimensions, where often only <1% of dimensions are non-zero. This is in contrast to the standard dense embedding models, which produce low-dimensional embeddings (e.g., 384, 768, or 1024 dimensions) where all values are non-zero.

Usually, each active dimension (i.e. the dimension with a non-zero value) in a sparse embedding corresponds to a specific token in the model's vocabulary, allowing for interpretability. This means that you can e.g. see exactly which words/tokens are important in an embedding, and that you can inspect exactly because of which words/tokens two texts are deemed similar.

Let's have a look at naver/splade-v3, a strong sparse embedding model, as an example:

from sentence_transformers import SparseEncoder

### Download from the 🤗 Hub
model = SparseEncoder("naver/splade-v3")

### Run inference
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)

### (3, 30522)
### Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)

### tensor([[   32.4323,     5.8528,     0.0258],
###         [    5.8528,    26.6649,     0.0302],

###         [    0.0258,     0.0302,    24.0839]])
### Let's decode our embeddings to be able to interpret them
decoded = model.decode(embeddings, top_k=10)
for decoded, sentence in zip(decoded, sentences):
    print(f"Sentence: {sentence}")
    print(f"Decoded: {decoded}")
    print()

Sentence: The weather is lovely today.
Decoded: [('weather', 2.754288673400879), ('today', 2.610959529876709), ('lovely', 2.431990623474121), ('currently', 1.5520408153533936), ('beautiful', 1.5046082735061646), ('cool', 1.4664798974990845), ('pretty', 0.8986214995384216), ('yesterday', 0.8603134155273438), ('nice', 0.8322536945343018), ('summer', 0.7702118158340454)]

Sentence: It's so sunny outside!
Decoded: [('outside', 2.6939032077789307), ('sunny', 2.535827398300171), ('so', 2.0600898265838623), ('out', 1.5397940874099731), ('weather', 1.1198079586029053), ('very', 0.9873268604278564), ('cool', 0.9406591057777405), ('it', 0.9026399254798889), ('summer', 0.684999406337738), ('sun', 0.6520509123802185)]

Sentence: He drove to the stadium.
Decoded: [('stadium', 2.7872302532196045), ('drove', 1.8208855390548706), ('driving', 1.6665740013122559), ('drive', 1.5565159320831299), ('he', 1.4721972942352295), ('stadiums', 1.449463129043579), ('to', 1.0441515445709229), ('car', 0.7002660632133484), ('visit', 0.5118278861045837), ('football', 0.502326250076294)]

In this example, the embeddings are 30,522-dimensional vectors, where each dimension corresponds to a token in the model's vocabulary. The decode method returned the top 10 tokens with the highest values in the embedding, allowing us to interpret which tokens contribute most to the embedding.

We can even determine the intersection or overlap between embeddings, very useful for determining why two texts are deemed similar or dissimilar:

### Let's also compute the intersection/overlap of the first two embeddings
intersection_embedding = model.intersection(embeddings[0], embeddings[1])
decoded_intersection = model.decode(intersection_embedding)
print(decoded_intersection)

Decoded: [('weather', 3.0842742919921875), ('cool', 1.379457712173462), ('summer', 0.5275946259498596), ('comfort', 0.3239051103591919), ('sally', 0.22571465373039246), ('julian', 0.14787325263023376), ('nature', 0.08582140505313873), ('beauty', 0.0588383711874485), ('mood', 0.018594780936837196), ('nathan', 0.000752730411477387)]

And if we think the embeddings are too big, we can limit the maximum number of active dimensions like so:

from sentence_transformers import SparseEncoder

### Download from the 🤗 Hub
model = SparseEncoder("naver/splade-v3")  # You can also set max_active_dims here instead of encode()

### Run inference
documents = [
    "UV-A light, specifically, is what mainly causes tanning, skin aging, and cataracts, UV-B causes sunburn, skin aging and skin cancer, and UV-C is the strongest, and therefore most effective at killing microorganisms. Again â\x80\x93 single words and multiple bullets.",
    "Answers from Ronald Petersen, M.D. Yes, Alzheimer's disease usually worsens slowly. But its speed of progression varies, depending on a person's genetic makeup, environmental factors, age at diagnosis and other medical conditions. Still, anyone diagnosed with Alzheimer's whose symptoms seem to be progressing quickly â\x80\x94 or who experiences a sudden decline â\x80\x94 should see his or her doctor.",
    "Bell's palsy and Extreme tiredness and Extreme fatigue (2 causes) Bell's palsy and Extreme tiredness and Hepatitis (2 causes) Bell's palsy and Extreme tiredness and Liver pain (2 causes) Bell's palsy and Extreme tiredness and Lymph node swelling in children (2 causes)",
]
embeddings = model.encode_document(documents, max_active_dims=64)
print(embeddings.shape)

### (3, 30522)
### Print the sparsity of the embeddings
sparsity = model.sparsity(embeddings)
print(sparsity)

### {'active_dims': 64.0, 'sparsity_ratio': 0.9979031518249132}

Click to see that it has minimal impact on scores

from sentence_transformers import SparseEncoder

### Download from the 🤗 Hub
model = SparseEncoder("naver/splade-v3")  # You can also set max_active_dims here instead of encode()

### Run inference
queries = ["what causes aging fast"]
documents = [
    "UV-A light, specifically, is what mainly causes tanning, skin aging, and cataracts, UV-B causes sunburn, skin aging and skin cancer, and UV-C is the strongest, and therefore most effective at killing microorganisms. Again â\x80\x93 single words and multiple bullets.",
    "Answers from Ronald Petersen, M.D. Yes, Alzheimer's disease usually worsens slowly. But its speed of progression varies, depending on a person's genetic makeup, environmental factors, age at diagnosis and other medical conditions. Still, anyone diagnosed with Alzheimer's whose symptoms seem to be progressing quickly â\x80\x94 or who experiences a sudden decline â\x80\x94 should see his or her doctor.",
    "Bell's palsy and Extreme tiredness and Extreme fatigue (2 causes) Bell's palsy and Extreme tiredness and Hepatitis (2 causes) Bell's palsy and Extreme tiredness and Liver pain (2 causes) Bell's palsy and Extreme tiredness and Lymph node swelling in children (2 causes)",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)

### Determine the sparsity
query_sparsity = model.sparsity(query_embeddings)
document_sparsity = model.sparsity(document_embeddings)
print(query_sparsity, document_sparsity)

### {'active_dims': 28.0, 'sparsity_ratio': 0.9990826289233995} {'active_dims': 174.6666717529297, 'sparsity_ratio': 0.9942773516888497}
### Calculate the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)

### tensor([[11.3767, 10.8296,  4.3457]], device='cuda:0')
### Again with smaller max_active_dims
smaller_document_embeddings = model.encode_document(documents, max_active_dims=64)

### Determine the sparsity for the smaller document embeddings
smaller_document_sparsity = model.sparsity(smaller_document_embeddings)
print(query_sparsity, smaller_document_sparsity)

### {'active_dims': 28.0, 'sparsity_ratio': 0.9990826289233995} {'active_dims': 64.0, 'sparsity_ratio': 0.9979031518249132}
### Print the similarity scores for the smaller document embeddings
smaller_similarities = model.similarity(query_embeddings, smaller_document_embeddings)
print(smaller_similarities)

### tensor([[10.1311,  9.8360,  4.3457]], device='cuda:0')
### Very similar to the scores for the full document embeddings!

Are they any good?

A big question is: How do sparse embedding models stack up against the “standard” dense embedding models, and what kind of performance can you expect when combining various?

For this, I ran a variation of our hybrid_search.py evaluation script, with:

The NanoMSMARCO dataset (a subset of the MS MARCO eval split)
Qwen/Qwen3-Embedding-0.6B dense embedding model
naver/splade-v3-doc sparse embedding model, inference free for queries
Alibaba-NLP/gte-reranker-modernbert-base reranker

Which resulted in this evaluation:

Dense	Sparse	Reranker	NDCG@10	MRR@10	MAP
x			65.33	57.56	57.97
	x		67.34	59.59	59.98
x	x		72.39	66.99	67.59
x		x	68.37	62.76	63.56
	x	x	69.02	63.66	64.44
x	x	x	68.28	62.66	63.44

Here, the sparse embedding model actually already outperforms the dense one, but the real magic happens when combining the two: hybrid search. In our case, we used Reciprocal Rank Fusion to merge the two rankings.

Rerankers also help improve the performance of the dense or sparse model here, but hurt the performance of the hybrid search, as its performance is already beyond what the reranker can achieve.

[!NOTE]
The naver/splade-v3-doc was trained on the MS MARCO training set, so this is in-domain performance, much like what you might expect if you finetune on your own data.

Resources

Check out the following links to get a better feel for what Sparse Encoders are, how they work, what architectures exist, how to use them, what pretrained models exist, how to finetune them, and more:

Blogpost:
- Training and Finetuning Sparse Embedding Models with Sentence Transformers v5
Documentation:
Models:
- Sparse Encoder Model Collection

Update Stats

The introduction of SparseEncoder has been one of the largest updates to Sentence Transformers, introducing all of the following:

Code:
- New Trainer, Training Arguments, Data Collator, Model Card generation + template, with backwards compatibility
- 4 new, 1 refactored modules to support at least 3 model archetypes: SPLADE, Inference-free SPLADE, and CSR
- 12 new losses
- 9 new evaluators
- 1 new Callback
- 4 example integrations with ElasticSearch, OpenSearch, Qdrant, and Seismic
Tests:
- 317 tests for SparseEncoder loading, inference, training, etc.
Docs:

New methods:`encode_query` and `encode_document`

Sentence Transformers v5.0 introduces two new core methods to the SentenceTransformer and SparseEncoder classes: encode_query and encode_document.

These methods are specialized versions of encode that differ in exactly two ways:

If no prompt_name or prompt is provided, it uses a predefined “query”/“document” prompt,
if available in the model’s prompts dictionary (example).
It sets the task to “query”/“document”. If the model has a Router
module, it will use the “query”/“document” task type to route the input through the appropriate submodules.

In short, if you use encode_query and encode_document, you can be sure that you're using the model's predefined prompts and use the correct route (if the model has multiple routes).

If you are unsure whether you should use encode, encode_query, or encode_documen),
your best bet is to use encode_query and encode_document for Information Retrieval tasks
with clear query and document/passage distinction, and use encode for all other tasks.

Note that encode is the most general method and can be used for any task, including Information
Retrieval, and that if the model was not trained with predefined prompts and/or task types, then all three methods will return identical embeddings.

See for example this snippet, which automatically uses the “query” prompt stored in the Qwen3-Embedding-0.6B model config.

from sentence_transformers import SentenceTransformer

### Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")

### The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

### Encode the queries and documents
query_embeddings = model.encode_query(queries)  # Equavalent to model.encode(queries, prompt_name="query")
document_embeddings = model.encode_document(documents)

### Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

### tensor([[0.7646, 0.1414],
###         [0.1355, 0.6000]])

Documentation: Migration Guide
Documentation: SentenceTransformer.encode
Documentation: SentenceTransformer.encode_query
Documentation: SentenceTransformer.encode_document

`encode_multi_process` absorbed by `encode`

The encode method (and by extension the encode_query and encode_document methods) can now be used directly for multi-processing/multi-GPU processing, instead of having to use encode_multi_process.

Previously, you had to manually start a multi-processing pool, use encode_multi_process, and stop the pool:

from sentence_transformers import SentenceTransformer

def main():
    model = SentenceTransformer("all-mpnet-base-v2")
    texts = ["The weather is so nice!", "It's so sunny outside.", ...]

    pool = model.start_multi_process_pool(["cpu", "cpu", "cpu", "cpu"])
    embeddings = model.encode_multi_process(texts, pool, chunk_size=512)
    model.stop_multi_process_pool(pool)

    print(embeddings.shape)

### => (4000, 768)

if __name__ == "__main__":
    main()

Now you can just pass a list of devices as device to encode:

from sentence_transformers import SentenceTransformer

def main():
    model = SentenceTransformer("all-mpnet-base-v2")
    texts = ["The weather is so nice!", "It's so sunny outside.", ...]

    embeddings = model.encode(texts, device=["cpu", "cpu", "cpu", "cpu"], chunk_size=512)

    print(embeddings.shape)

### => (4000, 768)

if __name__ == "__main__":
    main()

The multi-processing can be configured using these parameters:

device: If a list of devices, start multi-processing using those devices. Can be e.g. cpu, but also different GPUs.
pool: You can still use start_multi_process_pool and stop_multi_process_pool to create and stop a multi-processing pool, allowing you to reuse the pool across multiple encode calls via the pool arguments.
chunk_size: When you use multi-processing with n devices, then the inputs will be subdivided into chunks, and those chunks will be spread across the n processes. The size of the chunk can be defined here, although it’s optional. It can have a minor impact on processing speed and memory usage, but is much less important than the batch_size argument.
Documentation: Migration Guide
Documentation: SentenceTransformer.encode

Router module

The Sentence Transformers v5.0 release has refactored the Asym module into the Router module. The previous implementation wasn’t straightforward to use with the other components of the library. We’ve improved heavily on this to make the integration seamless. This module allows you to create asymmetric models that apply different modules depending on the specified route (often “query” or “document”).

Notably, you can use the task argument in model.encode to specify which route to use, and the model.encode_query and model.encode_document convenience methods automatically specify task="query" and task="document", respectively.

See for example opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill for an example of a model using a Router to specify different modules for queries vs documents. Its router_config.json specifies that the query route uses an efficient SparseStaticEmbedding module, while the document route uses the more expensive standard SPLADE modules: MLMTransformer with SpladePooling.

Usage is very straight-forward with the new encode_query and encode_document methods:

from sentence_transformers import SparseEncoder

### Download from the 🤗 Hub
model = SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill")
print(model)

### SparseEncoder(
###   (0): Router(

###     (query_0_SparseStaticEmbedding): SparseStaticEmbedding({'frozen': True}, dim=30522, tokenizer=DistilBertTokenizerFast)
###     (document_0_MLMTransformer): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'DistilBertForMaskedLM'})

###     (document_1_SpladePooling): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
###   )

### )
### Run inference
queries = ["what causes aging fast"]
documents = [
    "UV-A light, specifically, is what mainly causes tanning, skin aging, and cataracts, UV-B causes sunburn, skin aging and skin cancer, and UV-C is the strongest, and therefore most effective at killing microorganisms. Again â\x80\x93 single words and multiple bullets.",
    "Answers from Ronald Petersen, M.D. Yes, Alzheimer's disease usually worsens slowly. But its speed of progression varies, depending on a person's genetic makeup, environmental factors, age at diagnosis and other medical conditions. Still, anyone diagnosed with Alzheimer's whose symptoms seem to be progressing quickly â\x80\x94 or who experiences a sudden decline â\x80\x94 should see his or her doctor.",
    "Bell's palsy and Extreme tiredness and Extreme fatigue (2 causes) Bell's palsy and Extreme tiredness and Hepatitis (2 causes) Bell's palsy and Extreme tiredness and Liver pain (2 causes) Bell's palsy and Extreme tiredness and Lymph node swelling in children (2 causes)",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)

### [1, 30522] [3, 30522]
### Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)

### tensor([[12.0820,  6.5648,  5.0988]])

Note that if you wish to train a model with a Router, then you must specify the router_mapping training arguments that maps dataset column names to Router routes. Then the Trainer knows which route to use for each dataset column.

Note also that any models using Asym still work as before.

Documentat

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

renovate Bot force-pushed the renovate/sentence-transformers-5.x branch from 9718e49 to e29e904 Compare August 6, 2025 17:03

chore(deps): update dependency sentence-transformers to v5

cf80dfa

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

renovate Bot force-pushed the renovate/sentence-transformers-5.x branch from e29e904 to cf80dfa Compare September 22, 2025 12:53

renovate Bot changed the title ~~chore(deps): update dependency sentence-transformers to v5~~ chore(deps): update dependency sentence-transformers to v5 - autoclosed Sep 24, 2025

renovate Bot closed this Sep 24, 2025

renovate Bot deleted the renovate/sentence-transformers-5.x branch September 24, 2025 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): update dependency sentence-transformers to v5 - autoclosed#143

chore(deps): update dependency sentence-transformers to v5 - autoclosed#143
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/sentence-transformers-5.x

renovate Bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

renovate Bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

v5.1.1: - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative

Error if unused kwargs is passed & get_model_kwargs (#​3500)

Minor Features

Minor Fixes

All Changes

New Contributors

v5.1.0: - ONNX and OpenVINO backends offering 2-3x speedups; more hard negatives mining formats

Faster ONNX and OpenVINO backends for SparseEncoder models (#​3475)

Benchmarks

New n-tuple-scores output format from mine_hard_negatives (#​3430, #​3481)

Gathering Across Devices (#​3442, #​3453)

Trackio support (#​3467)

MTEB Documentation (#​3477)

Minor Notable Changes

All Changes

New Contributors

v5.0.0: - SparseEncoder support; encode_query & encode_document; multi-processing in encode; Router; and more

Sparse Encoder models

Are they any good?

Resources

Update Stats

New methods:encode_query and encode_document

encode_multi_process absorbed by encode

Router module

Configuration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

renovate Bot commented Jul 1, 2025 •

edited

Loading

`v5.1.1`: - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative

Error if unused kwargs is passed & `get_model_kwargs` (#3500)

`v5.1.0`: - ONNX and OpenVINO backends offering 2-3x speedups; more hard negatives mining formats

Faster ONNX and OpenVINO backends for SparseEncoder models (#3475)

New `n-tuple-scores` output format from `mine_hard_negatives` (#3430, #3481)

Gathering Across Devices (#3442, #3453)

Trackio support (#3467)

MTEB Documentation (#3477)

`v5.0.0`: - SparseEncoder support; encode_query & encode_document; multi-processing in encode; Router; and more

New methods:`encode_query` and `encode_document`

`encode_multi_process` absorbed by `encode`