EvalLLM is a framework designed to evaluate the robustness and fairness of Large Language Models (LLMs) through metamorphic testing. This method involves manipulating common human errors in text inputs and making changed to the demographic details of the input to evaluate the LLM-based applications performance. The framework supports:
- 6 Parameterized Metamorphic Relations (PMRs) for evaluating robustness by simulating various human errors.
- 2 Parameterized Metamorphic Relations (PMRs) for assessing fairness by altering demographic details.
- Python >= 3.8
- Required Python packages. Install them using:
pip install -r requirements.txt
- To test the plm-icd application, set up the plm-icd environment by cloning the plm-icd repository and following the setup instructions provided there.
- Move the folders from EvalLLM to the plm-icd directory.
-
Open the Notebook:
- Open the mr_execution notebook in Jupyter or another compatible environment.
-
Execute Initial Cells:
- Run the cells that include the following functions to prepare the data and perform initial calculations:
write_to_csv() preprocessing() cosine_similarity_score() transformer_similarity_score()
- Run the cells that include the following functions to prepare the data and perform initial calculations:
-
Invoke PMR:
- Create a new cell in the notebook and invoke the desired PMR function with the required parameter values. Ensure you specify the correct parameters according to the PMR you are testing.
-
Run
plm-icd:- Follow the instructions in the plm-icd repository to run
plm-icdwith the modified inputs. Ensure that you have moved the necessary folders from theEvalLLMdirectory to theplm-icddirectory as described in the prerequisites.
- Follow the instructions in the plm-icd repository to run
-
Open the Notebook:
- Open the output_analyzer notebook in Jupyter or another compatible environment.
-
Execute Analysis Cells:
- Run the cells that include the following functions to process and analyze the results:
write_to_csv() merge_csv() format_predicted_labels() flatten_extend() derive_intersecting_labels() cal_intersection_percent()
- Run the cells that include the following functions to process and analyze the results:
-
Calculate Intersection Percent:
- Create a new cell and invoke the
cal_intersection_percent()function with the input file path and the desired output file path. This function will generate the$I_{MR}$ values for each input, along with the mean$I_{MR}$ for the respective MR.
- Create a new cell and invoke the