13 Jan 09:31

elronbandel

20b951e

Unitxt 1.26.9 Latest

Latest

What's Changed

lazy import of scipy by @assaftibm in #1959
Fix duplicate-column sorting issue in Text2SQL evaluation utils by @oktie in #1954
Update version to 1.26.9 by @elronbandel in #1961

Full Changelog: 1.26.8...1.26.9

Contributors

oktie, elronbandel, and assaftibm

Assets 2

06 Jan 12:22

elronbandel

1.26.8

ea47e51

Unitxt 1.26.8

What's Changed

add ollama classification engine by @lilacheden in #1955
make ollamaInferenceEngine handle return_meta_data by @lilacheden in #1956
lazy import of evaluate by @assaftibm in #1957
Update version to 1.26.8 by @elronbandel in #1958

Full Changelog: 1.26.7...1.26.8

Contributors

elronbandel, assaftibm, and lilacheden

Assets 2

03 Dec 15:49

elronbandel

1.26.7

f24c1be

Unitxt 1.26.7

What's Changed

Fix examples by @elronbandel in #1907
Fix inference tests by @elronbandel in #1912
Minor text2sql metric fixes by @oktie in #1913
fix mtrag by @dafnapension in #1918
fix xlam's schema issues by @dafnapension in #1917
Add ReflectionToolCallingMetricSyntactic for evaluating tool call predictions referenceless by @korenLazar in #1923
Divide biggen bench into multigual and not multilingual by @martinscooper in #1922
remove redundant split from airbench2024 by @dafnapension in #1928
Revert BigGen Benchmark partition by @martinscooper in #1924
fixed spit names in wiki_bio by @dafnapension in #1925
Fix erroneous prompts in evaluation tasks (and clean some json-schema-wise) by @dafnapension in #1920
fix the only 4 erroneous global_mmlu cards that do not pass _source_to_dataset by @dafnapension in #1916
Normalize llm judge bench target variable by @martinscooper in #1933
Improved multi turn evaluation to be self contained and use LLM as judge by @yoavkatz in #1929
Add more RAG judges by @arielge in #1934
Add ReflectionToolCallingMetric and update related metrics by @korenLazar in #1931
potential fix for preparation file: prepare/cards/mtrag.py by @dafnapension in #1938
Lazy load vectara hhem model because it is gated in HF by @yoavkatz in #1946
Fixed missing sampling_seed in DiverseLabelsSampler by @yoavkatz in #1941
Correct reflection based tool calling metrics so valid results will be 1. by @yoavkatz in #1940
Rag metric update again by @dafnapension in #1948
fix gpt-oss classification inference engines by @lilacheden in #1952
Update version to 1.26.7 by @elronbandel in #1953

New Contributors

@korenLazar made their first contribution in #1923

Full Changelog: 1.26.6...1.26.7

Contributors

oktie, martinscooper, and 6 other contributors

Assets 2

07 Aug 08:44

elronbandel

1.26.6

076649a

Unitxt 1.26.6

What's Changed

Update pearsonr tests by @elronbandel in #1890
return source_to_recipe to performance evaluation, once 403 is fixed by bnayahu by @dafnapension in #1891
remove a card whose preprocess_steps do not match the contents of the loaded dataset by @dafnapension in #1893
fix an ineffective setting of max size of loader_cache by @dafnapension in #1892
Fix compatibility with datasets 4.0 by @elronbandel in #1861
Improve speed in mmlu global by @elronbandel in #1895
Remove the need for datasets<4.0.0 by @elronbandel in #1897
Refresh README by @elronbandel in #1898
Update Readme by @elronbandel in #1899
Update README by @elronbandel in #1900
Update README by @elronbandel in #1901
Fix docs and example of how to use benchmark by @elronbandel in #1903
Refine condition for avoiding the Benchmark wrapper by @bnayahu in #1904
Complete transition to datasets 4.0.0 in preparation tests by @dafnapension in #1902
Make sacrebleu faster and more efficient by @elronbandel in #1906
Implements LogProbEngine on CrossInference and adds more granite guardian models by @martinscooper in #1905
Remove IBM GenAI support and moved legacy GenAI metrics to use CrossProviderInferenceEngine by @yoavkatz in #1508
GPT on rits and minor llm judge criteria changes by @martinscooper in #1909
The special installation of networkx can be removed as well by @dafnapension in #1908
Update version to 1.26.6 by @elronbandel in #1911

Full Changelog: 1.26.5...1.26.6

Contributors

martinscooper, bnayahu, and 3 other contributors

Assets 2

31 Jul 14:10

elronbandel

1.26.5

8eb8974

Unitxt 1.26.5

What's Changed

For load_dataset, use_cache default value is taken from settings by @eladven in #1880
Support watsonx.ai on-prem credentials by @pratapkishorevarma in #1883
extend condition to also filter by field exists or not by @dafnapension in #1879
fix performance test by @dafnapension in #1884
Add support for inline-defined templates in the UI by @Chemafiz in #1886
Mitigate HTTP 403 errors in pandas by @bnayahu in #1888
Biggen benchmark and pearson correlation metric by @martinscooper in #1887
Update version to 1.26.5 by @elronbandel in #1889

New Contributors

@pratapkishorevarma made their first contribution in #1883
@Chemafiz made their first contribution in #1886

Full Changelog: 1.26.4...1.26.5

Contributors

eladven, martinscooper, and 5 other contributors

Assets 2

22 Jul 14:35

elronbandel

1.26.4

83063f9

Unitxt 1.26.4

What's Changed

Add more Judgebench benchmarks by @martinscooper in #1869
Make sqlite3 not an optional dependency by @elronbandel in #1871
Removed legacy topicality, idk, and groundness metrics that worked only on BAM by @yoavkatz in #1875
Bench and models by @martinscooper in #1872
Handle a case in ToolCallPostProcessor where prediction is an empty list of tools by @yoavkatz in #1874
Update version to 1.26.4 by @elronbandel in #1876

Full Changelog: 1.26.3...1.26.4

Contributors

martinscooper, elronbandel, and yoavkatz

Assets 2

16 Jul 17:47

elronbandel

1.26.3

728fcc8

Unitxt 1.26.3

What's Changed

LLM Judge: Improve context/prediction fields parsing by @martinscooper in #1856
Fixed bug in tool inference by @yoavkatz in #1868
Added a new MetricBasedNer that allows calculating entity similary using any Unitxt metric. by @yoavkatz in #1860
Update version to 1.26.3 by @elronbandel in #1870

Full Changelog: 1.26.2...1.26.3

Contributors

martinscooper, elronbandel, and yoavkatz

Assets 2

16 Jul 09:44

elronbandel

1.26.2

68aa406

Unitxt 1.26.2

What's Changed

Add tot dataset by @elronbandel in #1865
Add tokenizer_name to base huggingface inference engines by @elronbandel in #1862
Add hf to cross provider inference engine by @yoavkatz in #1866
Update version to 1.26.2 by @elronbandel in #1867

Full Changelog: 1.26.1...1.26.2

Contributors

elronbandel and yoavkatz

Assets 2

10 Jul 17:27

elronbandel

1.26.1

b6cc840

Unitxt 1.26.1

Lock datasets dependency to <4.0.0

The latest datasets v4.0.0 release removes support for loading datasets with trust_remote_code=True. This change breaks compatibility with many datasets currently in the Unitxt catalog, as several datasets require this feature to load properly.

This patch restricts the datasets version to below 4.0.0 until we can find or develop replacements for affected datasets.

Assets 2

09 Jul 14:27

elronbandel

1.26.0

9561615

Unitxt 1.26.0 - Multi Threading

Main changes:

Made Unitxt Thread-Safe so it can run in multi-threaded environments.
Added an option to set sampling seed for demos (in context example). This is done by demos_sampling_seed. It allows running the same dataset with different demo examples.
Improved printouts of instance scores with to_markdown() and summary in Unitxt. For example :

results = evaluate(predictions=predictions, data=dataset)
print(results.instance_scores.summary)

All changes:

Add to_markdown() to InstanceScores to pretty print output by @yoavkatz in #1846
Improved InstanceScores summary to be readible and in decent width by @yoavkatz in #1847
Improve multi turn tool calling example by @elronbandel in #1848
Add metrics documentation including range, directionality and references by @elronbandel in #1850
Fix sacrebleu documentation by @elronbandel in #1851
Add F1 score documentation to F1Fast metric class by @elronbandel in #1852
Add more llmjudge benchmarks by @martinscooper in #1804
Fix llama scout name and url on rits by @martinscooper in #1857
Add demos_sampling_seed to recipe api by @elronbandel in #1858
Add comprehensive multi threading support and tests by @elronbandel in #1853
Update BlueBench to match the original implementation by @bnayahu in #1855

Full Changelog: 1.25.0...1.26.0

Contributors

martinscooper, bnayahu, and 2 other contributors

Assets 2

Releases: IBM/unitxt

Unitxt 1.26.9

What's Changed

Contributors

Uh oh!

Unitxt 1.26.8

What's Changed

Contributors

Uh oh!

Unitxt 1.26.7

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.26.6

What's Changed

Contributors

Uh oh!

Unitxt 1.26.5

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.26.4

What's Changed

Contributors

Uh oh!

Unitxt 1.26.3

What's Changed

Contributors

Uh oh!

Unitxt 1.26.2

What's Changed

Contributors

Uh oh!

Unitxt 1.26.1

Lock datasets dependency to <4.0.0

Uh oh!

Unitxt 1.26.0 - Multi Threading

Contributors

Uh oh!