StressTest

Official repository of the paper:

StressTest: Can YOUR Speech LM Handle the Stress?

🌐 Project | 📃 Paper | 🤗 StressTest Dataset
| 🤗 StresSLM Model

This repository provides code for evaluating Sentence Stress Detection (SSD) and Sentence Stress Reasoning (SSR) on StressTest benchmark.

It includes:

Evaluation of our proposed model StresSLM.
Examples to run evaluation with two additional models.

It also includes Stress-17K training data loading and augmentation script used to train StresSLM.

🚀 Getting Started

🔧 Installation

Clone the repository and install the dependencies:

git clone https://github.com/slp-rl/StressTest.git
cd StressTest
pip install -r requirements.txt

📊 Evaluation

✅ Running the Evaluations

We evaluate models using our judgment-based protocol. You’ll need an OpenAI API key for the judge (e.g., GPT-4) evaluation. Set the key as an environment variable:

export OPENAI_API_KEY=your_openai_api_key

altenatively, you can set the key in the stresstest/evaluation/configs.py file:

class Settings(BaseSettings):
    OPENAI_API_KEY: str = "your_openai_api_key"

Then run the evaluation script:

python -m stresstest.evaluation.main \
    --task ssr \
    --model_to_evaluate stresslm

You can change the --task flag to ssd for the Sentence Stress Detection task. --model_to_evaluate can be one of the following ["stresslm", "qwen2audio", "gpt-4o-audio", "mock"].

the script will create a results/ directory at the project root to store evaluation outputs. The expected project structure is:

StressTest
├── infra
├── stresstest
│   └── evaluation
└── results

🤔 Evaluating Your Own Model

To evaluate your own model, implement it using the following interface and place it under the stresstest/evaluation/src/inference directory:

from abc import ABC, abstractmethod

class InferenceClientBase(ABC):

    @abstractmethod
    def prepare(self, *args, **kwargs) -> dict:
        """
        Prepare method to be implemented by subclasses. 
        This method should return a dictionary with the necessary inputs for the predict method.
        The returned ditionary is handled by the evaluation script.
        """
        pass

    @abstractmethod
    def predict(self, *args, **kwargs) -> str:
        """Predict method to be implemented by subclasses."""
        pass

Then, register your model by updating the configs.py and clients.py files in the stresstest/evaluation folder. Make sure your new model is included as a valid option for the --model_to_evaluate argument.

🏋️‍♂️ Training

We release:

The synthetic training data Stress-17K used to train StresSLM (released).
The training script for finetuning on SSD and SSR (coming soon).

Stay tuned!

🧪 Synthetic Training Data — `Stress-17K`

We release Stress-17K, a synthetic dataset generated via our proposed pipeline. It supports multi-task instruction tuning across four task types to improve performance on SSD and SSR tasks.

The raw pre-augmented dataset is available on 🤗 Hugging Face under: slprl/Stress-17K-raw and is automatically downloaded by the augmentation script.

🔄 Usage Example

You can use the DatasetAugmentation class to load, structure, and augment the data:

from data_augmentation import DatasetAugmentation

data_augmentation = DatasetAugmentation(n_proc=8)
data_augmentation.train_test_split(test_size=0.15)
data_augmentation.prepare_structure_for_augmentation()
data_augmentation.augment_with_training_prompts(tasks='all')
augmented_dataset = data_augmentation.get_augmented_dataset()

The augmentation utilities are available under:

StressTest
├── infra
├── stresstest
│   └── training
│       └── stress_17k

Each sample can be augmented into multiple instruction-following formats defined in a YAML configuration. This YAML file is also located in the stress_17k directory and can be edited to add new tasks or modify existing ones.

📖 Citation

If you use this work, please cite our paper:

@misc{yosha2025stresstest,
      title={StressTest: Can YOUR Speech LM Handle the Stress?}, 
      author={Iddo Yosha and Gallil Maimon and Yossi Adi},
      year={2025},
      eprint={2505.22765},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.22765}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
imgs		imgs
infra		infra
stresstest		stresstest
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StressTest

🚀 Getting Started

🔧 Installation

📊 Evaluation

✅ Running the Evaluations

🤔 Evaluating Your Own Model

🏋️‍♂️ Training

🧪 Synthetic Training Data — `Stress-17K`

🔄 Usage Example

📖 Citation

About

Uh oh!

Releases

Packages

Languages

License

slp-rl/StressTest

Folders and files

Latest commit

History

Repository files navigation

StressTest

🚀 Getting Started

🔧 Installation

📊 Evaluation

✅ Running the Evaluations

🤔 Evaluating Your Own Model

🏋️‍♂️ Training

🧪 Synthetic Training Data — Stress-17K

🔄 Usage Example

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🧪 Synthetic Training Data — `Stress-17K`

Packages