Evaluation

## Original Task
Citing from the original course task:

> Training a strong Hebrew Sentence Encoder from a pretrained Decoder While recent years
have brought many additions to the open-source set of pretrained LMs in high-resource languages such
as English, most of these tools are not directly useful for use on Hebrew Inputs. Recently, a new project
aiming to bridge this gap has introduced new tools and most importantly benchmarks for Herbrew LMs.
Concurrently, some new open-source strong models have been trained on Hebrew text, most recently, the
DictaLM 2.0. In this project, you will modify the DictaLM model to be a strong Encoder-model using
the LLM2Vec method. To evaluate the result, you will train linear classifier for a Hebrew sentiment
analysis task on top of embeddings from your trained model, and against some baselines. Such baselines
can be strong English and multilingual pretrained models, and existing pretrained Hebrew encoders (for
example, AlephBERT and AlephBERTGimmel).

See this github issue - https://github.com/UKPLab/sentence-transformers/issues/2547#issuecomment-2020153378
And read - https://huggingface.co/docs/setfit/conceptual_guides/setfit#classifier-training-phase

## Data
Hebrew sentiment analysis dataset - https://huggingface.co/datasets/HebArabNlpProject/HebrewSentiment

**As the benchmarks for hebrew sentiment used in Alephbert etc was proven as leaked**

## Classifier
We used the recommended approach of training a **Logistic Regression** classifier on top of the model embeddings, especially as recommended in:
1. Mistral blog (as our model is mistral-based) - https://github.com/mistralai/mistral-inference/blob/main/tutorials/classifier.ipynb
2. As demonstrated by the famous Jay Alamar - https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/nlp/03_Sentence_Classification_with_BERT.ipynb
3. By huggingfaces' creators book - Natural Language Processing with Transformers, Revised Edition, chapter 2



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation #8

Original Task

Data

Classifier

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Evaluation #8

Description

Original Task

Data

Classifier

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions