EvalLLM: Framework for Evaluating Robustness and Fairness in LLMs Using Metamorphic Testing

Overview

EvalLLM is a framework designed to evaluate the robustness and fairness of Large Language Models (LLMs) through metamorphic testing. This method involves manipulating common human errors in text inputs and making changed to the demographic details of the input to evaluate the LLM-based applications performance. The framework supports:

6 Parameterized Metamorphic Relations (PMRs) for evaluating robustness by simulating various human errors.
2 Parameterized Metamorphic Relations (PMRs) for assessing fairness by altering demographic details.

Prerequisites

Python >= 3.8
Required Python packages. Install them using:
```
pip install -r requirements.txt
```
To test the plm-icd application, set up the plm-icd environment by cloning the plm-icd repository and following the setup instructions provided there.
Move the folders from EvalLLM to the plm-icd directory.

Steps to Follow

Running PMRs for Modifying Inputs

Open the Notebook:
- Open the mr_execution notebook in Jupyter or another compatible environment.
Execute Initial Cells:
- Run the cells that include the following functions to prepare the data and perform initial calculations:
```
write_to_csv()
preprocessing()
cosine_similarity_score()
transformer_similarity_score()
```
Invoke PMR:
- Create a new cell in the notebook and invoke the desired PMR function with the required parameter values. Ensure you specify the correct parameters according to the PMR you are testing.
Run plm-icd:
- Follow the instructions in the plm-icd repository to run plm-icd with the modified inputs. Ensure that you have moved the necessary folders from the EvalLLM directory to the plm-icd directory as described in the prerequisites.

Analyzing Results from `plm-icd`

Open the Notebook:
- Open the output_analyzer notebook in Jupyter or another compatible environment.

Execute Analysis Cells:

Run the cells that include the following functions to process and analyze the results:

write_to_csv()
merge_csv()
format_predicted_labels()
flatten_extend()
derive_intersecting_labels()
cal_intersection_percent()

Calculate Intersection Percent:
- Create a new cell and invoke the cal_intersection_percent() function with the input file path and the desired output file path. This function will generate the $I_{MR}$ values for each input, along with the mean $I_{MR}$ for the respective MR.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
inference		inference
modified_input		modified_input
notebook		notebook
prediction_output		prediction_output
report		report
test_status		test_status
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvalLLM: Framework for Evaluating Robustness and Fairness in LLMs Using Metamorphic Testing

Overview

Prerequisites

Steps to Follow

Running PMRs for Modifying Inputs

Analyzing Results from `plm-icd`

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EvalLLM: Framework for Evaluating Robustness and Fairness in LLMs Using Metamorphic Testing

Overview

Prerequisites

Steps to Follow

Running PMRs for Modifying Inputs

Analyzing Results from plm-icd

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Analyzing Results from `plm-icd`

Packages