chore(deps): update dependency sentence-transformers to v5 - autoclosed#143
Closed
renovate[bot] wants to merge 1 commit into
Closed
chore(deps): update dependency sentence-transformers to v5 - autoclosed#143renovate[bot] wants to merge 1 commit into
renovate[bot] wants to merge 1 commit into
Conversation
9718e49 to
e29e904
Compare
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
e29e904 to
cf80dfa
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Coming soon: The Renovate bot (GitHub App) will be renamed to Mend. PRs from Renovate will soon appear from 'Mend'. Learn more here.
This PR contains the following updates:
==4.1.0->==5.1.1Release Notes
UKPLab/sentence-transformers (sentence-transformers)
v5.1.1: - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negativeCompare Source
This patch makes Sentence Transformers more explicit with incorrect arguments and introduces some fixes for multi-GPU processing, evaluators, and hard negatives mining.
Install this version with
Error if unused kwargs is passed &
get_model_kwargs(#3500)Some SentenceTransformer or SparseEncoder models support custom model-specific keyword arguments, such as jinaai/jina-embeddings-v4. As of this release, calling
model.encodewith keyword arguments that aren't used by the model will result in an error.Quite useful when you, for example, accidentally forget that the parameter to get normalized embeddings is
normalize_embeddings. Prior to this version, this parameter would simply quietly be ignored.To check which custom extra keyword arguments may be used for your model, you can call the new
get_model_kwargsmethod:Note: You can always pass the
taskparameter, it's the only model-specific parameter that will be quietly ignored. This means that you can always usemodel.encode(..., task="query")andmodel.encode(..., task="document").Minor Features
Minor Fixes
batch_sizebeing ignored inCrossEncoderRerankingEvaluator(#3497)encode, embeddings are now moved from the various devices to the CPU before being stacked into one tensor (#3488)encode_queryandencode_documentinmine_hard_negatives, automatically using defined "query" and "document" prompts (#3502)output_paththat doesn't exist yet (#3516)mine_hard_negatives(#3504)All Changes
fix] add batch size parameter to model prediction in CrossEncoderRerankingEvaluator by @emapco in #3497fix] Ensure multi-process embeddings are moved to CPU for concatenation by @tomaarsen in #3488model_card] Don't override manually provided languages in model card by @tomaarsen in #3501tests] Add hard negatives test showing multiple positives are correctly handled by @tomaarsen in #3503feat] Use encode_document and encode_query in mine_hard_negatives by @tomaarsen in #3502input_ids,attention_mask,token_type_ids,inputs_embedsto forward by @Samoed in #3509feat] add get_model_kwargs method; throw error if unused kwarg is passed by @tomaarsen in #3500fix] Fix the number of missing negatives in mine_hard_negatives by @tomaarsen in #3504New Contributors
Full Changelog: huggingface/sentence-transformers@v5.1.0...v5.1.1
v5.1.0: - ONNX and OpenVINO backends offering 2-3x speedups; more hard negatives mining formatsCompare Source
This release introduces 2 new efficient computing backends for SparseEncoder embedding models: ONNX and OpenVINO + optimization & quantization, allowing for speedups up to 2x-3x; a new "n-tuple-score" output format for hard negative mining for distillation; gathering across devices for free lunch on multi-gpu training; trackio support; MTEB documentation; any many small fixes and features.
Install this version with
Faster ONNX and OpenVINO backends for SparseEncoder models (#3475)
Introducing a new
backendkeyword argument to theSparseEncoderinitialization, allowing values of"torch"(default),"onnx", and"openvino".These require installing
sentence-transformerswith specific extras:It's as simple as:
If you specify a
backendand your model repository or directory contains an ONNX/OpenVINO model file, it will automatically be used! And if your model repository or directory doesn't have one already, an ONNX/OpenVINO model will be automatically exported. Just remember tomodel.push_to_hubormodel.save_pretrainedinto the same model repository or directory to avoid having to re-export the model every time.All keyword arguments passed via
model_kwargswill be passed on toORTModelForMaskedLM.from_pretrainedorORTModelForMaskedLM.from_pretrained. The most useful arguments are:provider: (Only ifbackend="onnx") ONNX Runtime provider to use for loading the model, e.g."CPUExecutionProvider". See https://onnxruntime.ai/docs/execution-providers/ for possible providers. If not specified, the strongest provider (E.g."CUDAExecutionProvider") will be used.file_name: The name of the ONNX file to load. If not specified, will default to "model.onnx" or otherwise "onnx/model.onnx" for ONNX, and "openvino_model.xml" and "openvino/openvino_model.xml" for OpenVINO. This argument is useful for specifying optimized or quantized models.export: A boolean flag specifying whether the model will be exported. If not provided, export will be set to True if the model repository or directory does not already contain an ONNX or OpenVINO model.Benchmarks
We ran benchmarks for CPU and GPU, averaging findings across 3 datasets, and numerous batch sizes. Here are the findings:
These findings resulted in these recommendations:
For GPU, you can expect 1.81x speedup with bf16 at no cost, and for CPU you can expect up to ~3x speedup at minimal cost of accuracy in our evaluation. Your mileage with the accuracy hit for quantization may vary, but it seems to remain very small.
Read the Speeding up Inference documentation for more details.
New
n-tuple-scoresoutput format frommine_hard_negatives(#3430, #3481)The
mine_hard_negativesutility function has been extended to support then-tuple-scoresoutput format, which outputs negatives intonum_negatives+ 3 columns:where the 'score' is a list of scores for the query-answer plus each query-negative pair.
This format is directly usable in various distillation losses:
Note that without applying any
absolute_margin,relative_margin,max_score, etc., you can mine negatives that actually score better than your positive. With a distillation loss, this is totally fine. It will learn using the (margins between the) scores, so you don't have to worry about false negatives as much as when using e.g. MultipleNegativesRankingLoss.This release also adds support for 1) n-tuples instead of just triplets and 2) num_negatives + 1 scores where the first score is the query-positive score for MarginMSELoss for CrossEncoder models.
Gathering Across Devices (#3442, #3453)
Various loss functions in Sentence Transformers take advantage of so-called "in-batch negatives". With these losses, for each sample in a batch, all data for the other samples will be considered as negatives, because random inputs are likely unrelated to the sample. This pushes them further apart, resulting ideally only in higher similarity scores for inputs that really are similar.
This release introduces a new
gather_across_devicesparameter for each of these losses. This parameter only works in a multi-GPU setting, and will pull the other samples from other devices into the computation. In short: if you have the following setup:mini_batch_size=16per_device_train_batch_size=128in theSentenceTransformersTrainingArgumentsThen each device will have the memory usage corresponding to a batch size of 16, while each sample has 1 positive and 255 negatives (1 hard negative for that sample, 127 other positive values as in-batch negatives and 127 other negative values as in-batch negatives). Your global batch size will be 128 * 8 = 1024, and your learning rate should be set according to that value.
Now, if you use the exact same setup, but with
gather_across_devices=True, then your setting is suddenly:Each device will have the memory usage corresponding to a batch size of 16, while each sample has 1 positive and 2047 negatives (1 hard negative for that sample, 1023 other positive values as in-batch negatives and 1023 other negative values as in-batch negatives). Your global batch size will be 128 * 8 = 1024, and your learning rate should be set according to that value.
The difference is that the in-batch negatives will now pull from other devices too! Because a larger batch size often results in stronger models with in-batch negatives losses, this should give stronger models at almost no overhead.
Here are the results from one of my simple experiments with finetuning
mpnet-baseonnatural-questionswith 8 GPUs:baseline:
gather_across_devices=True:
Trackio support (#3467)
If your
transformersversion is high enough, and you havetrackioinstalled (pip install trackio), then Sentence Transformers will also export logs to Trackio. It'll allow you to browse to localhost to track your experiments for free.MTEB Documentation (#3477)
If you're interested in evaluating your
SentenceTransformermodels on common benchmarks, then MTEB is your friend. However, there wasn't yet any documentation to guide you in the right direction. This release, we added some:Minor Notable Changes
prompt(#3444)Routertorch initialization, resulted in issues with DataParallel and memory usage (#3454)CrossEncoder.predictis called with an empty list (#3466)All Changes
docs] Fix link in README for training script name by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3417l/3417docs] Fix arxiv link in SpladePooling docs by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3418l/3418tests] Reuse models more where possible by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3432l/3432model card] Avoid pipe characters that mess up table formatting by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3429l/3429feat] Add "n-tuple-scores" output format to mine_hard_negatives function by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3430l/3430feat] Avoid unneeded warning when calling encode_query/document with prompt by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3444l/3444compat] Fix compatibility issues with datasets v4 by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3445l/3445promptstype with documentation by @FremyCompahttps://github.com/UKPLab/sentence-transformers/pull/3427l/3427feat] Add gather_across_devices parameter to some contrastive losses by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3442l/3442chore] Redistribute util.py (and its tests) to separate directory by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3446l/3446tests] Reduce the number of hub requests for the model card tests by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3447l/3447fix] cast indexing numpy int to Python int by @emaphttps://github.com/UKPLab/sentence-transformers/pull/3455l/3455fix] Fix Router torch initialization, fixes DP by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3454l/3454fix] Patchgather_across_devicesfor in-batch negatives losses by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3453l/3453feat] Update the trackio default project if not already defined by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3467l/3467docs] Fix dead link in ContrastiveLoss references by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3476l/3476docs] Add splade_index semantic search example by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3473l/3473feat] Add ONNX, OV support for SparseEncoder; refactor ONNX/OV by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3475l/3475fix] FIPS compatibility - use SHA256 with usedforsecurity=False in hard negatives caching by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3479l/3479feat] Allow n-tuples for CE MarginMSE training by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3481l/3481docs] Update main sbert.net page with v5.1 mention by @tomaarshttps://github.com/UKPLab/sentence-transformers/pull/3482l/3482New Contributors
Also thanks to @Samoed and @KennethEnevoldsen for their reviews on the MTEB documentation, and thanks to @NohTow for the inspiration on gathering across devices.
Full Changelog: huggingface/sentence-transformers@v5.0.0...v5.1.0
v5.0.0: - SparseEncoder support; encode_query & encode_document; multi-processing in encode; Router; and moreCompare Source
This release consists of significant updates including the introduction of Sparse Encoder models, new methods
encode_queryandencode_document, multi-processing support inencode, theRoutermodule for asymmetric models, custom learning rates for parameter groups, composite loss logging, and various small improvements and bug fixes.Install this version with
Sparse Encoder models
The Sentence Transformers v5.0 release introduces Sparse Embedding models, also known as Sparse Encoders. These models generate high-dimensional embeddings, often with 30,000+ dimensions, where often only <1% of dimensions are non-zero. This is in contrast to the standard dense embedding models, which produce low-dimensional embeddings (e.g., 384, 768, or 1024 dimensions) where all values are non-zero.
Usually, each active dimension (i.e. the dimension with a non-zero value) in a sparse embedding corresponds to a specific token in the model's vocabulary, allowing for interpretability. This means that you can e.g. see exactly which words/tokens are important in an embedding, and that you can inspect exactly because of which words/tokens two texts are deemed similar.
Let's have a look at naver/splade-v3, a strong sparse embedding model, as an example:
In this example, the embeddings are 30,522-dimensional vectors, where each dimension corresponds to a token in the model's vocabulary. The
decodemethod returned the top 10 tokens with the highest values in the embedding, allowing us to interpret which tokens contribute most to the embedding.We can even determine the intersection or overlap between embeddings, very useful for determining why two texts are deemed similar or dissimilar:
And if we think the embeddings are too big, we can limit the maximum number of active dimensions like so:
Click to see that it has minimal impact on scores
Are they any good?
A big question is: How do sparse embedding models stack up against the “standard” dense embedding models, and what kind of performance can you expect when combining various?
For this, I ran a variation of our hybrid_search.py evaluation script, with:
Which resulted in this evaluation:
Here, the sparse embedding model actually already outperforms the dense one, but the real magic happens when combining the two: hybrid search. In our case, we used Reciprocal Rank Fusion to merge the two rankings.
Rerankers also help improve the performance of the dense or sparse model here, but hurt the performance of the hybrid search, as its performance is already beyond what the reranker can achieve.
Resources
Check out the following links to get a better feel for what Sparse Encoders are, how they work, what architectures exist, how to use them, what pretrained models exist, how to finetune them, and more:
Update Stats
The introduction of SparseEncoder has been one of the largest updates to Sentence Transformers, introducing all of the following:
New methods:
encode_queryandencode_documentSentence Transformers v5.0 introduces two new core methods to the
SentenceTransformerandSparseEncoderclasses:encode_queryandencode_document.These methods are specialized versions of
encodethat differ in exactly two ways:prompt_nameorpromptis provided, it uses a predefined “query”/“document” prompt,if available in the model’s
promptsdictionary (example).taskto “query”/“document”. If the model has aRoutermodule, it will use the “query”/“document” task type to route the input through the appropriate submodules.
In short, if you use
encode_queryandencode_document, you can be sure that you're using the model's predefined prompts and use the correct route (if the model has multiple routes).If you are unsure whether you should use
encode,encode_query, orencode_documen),your best bet is to use
encode_queryandencode_documentfor Information Retrieval taskswith clear query and document/passage distinction, and use
encodefor all other tasks.Note that
encodeis the most general method and can be used for any task, including InformationRetrieval, and that if the model was not trained with predefined prompts and/or task types, then all three methods will return identical embeddings.
See for example this snippet, which automatically uses the “query” prompt stored in the Qwen3-Embedding-0.6B model config.
encode_multi_processabsorbed byencodeThe
encodemethod (and by extension theencode_queryandencode_documentmethods) can now be used directly for multi-processing/multi-GPU processing, instead of having to useencode_multi_process.Previously, you had to manually start a multi-processing pool, use
encode_multi_process, and stop the pool:Now you can just pass a list of devices as
devicetoencode:The multi-processing can be configured using these parameters:
device: If a list of devices, start multi-processing using those devices. Can be e.g. cpu, but also different GPUs.pool: You can still usestart_multi_process_poolandstop_multi_process_poolto create and stop a multi-processing pool, allowing you to reuse the pool across multipleencodecalls via thepoolarguments.chunk_size: When you use multi-processing with n devices, then the inputs will be subdivided into chunks, and those chunks will be spread across the n processes. The size of the chunk can be defined here, although it’s optional. It can have a minor impact on processing speed and memory usage, but is much less important than thebatch_sizeargument.Documentation: Migration Guide
Documentation: SentenceTransformer.encode
Router module
The Sentence Transformers v5.0 release has refactored the
Asymmodule into theRoutermodule. The previous implementation wasn’t straightforward to use with the other components of the library. We’ve improved heavily on this to make the integration seamless. This module allows you to create asymmetric models that apply different modules depending on the specified route (often “query” or “document”).Notably, you can use the
taskargument inmodel.encodeto specify which route to use, and themodel.encode_queryandmodel.encode_documentconvenience methods automatically specifytask="query"andtask="document", respectively.See for example opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill for an example of a model using a
Routerto specify different modules for queries vs documents. Its router_config.json specifies that the query route uses an efficientSparseStaticEmbeddingmodule, while the document route uses the more expensive standard SPLADE modules:MLMTransformerwithSpladePooling.Usage is very straight-forward with the new
encode_queryandencode_documentmethods:Note that if you wish to train a model with a
Router, then you must specify therouter_mappingtraining arguments that maps dataset column names toRouterroutes. Then the Trainer knows which route to use for each dataset column.Note also that any models using
Asymstill work as before.Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.