LUNON (Log-likelihood Uniqueness using Normalized Offset to Non-personalized model)

This repository contains the implementation of LUNON (Log-likelihood Uniqueness using Normalized Offset to Non-personalized model), a metric for evaluating persona-likeness in language model outputs, along with human evaluation datasets and correlation analysis scripts.

Overview

LUNON is a reference-free evaluation metric that measures how well a language model captures specific personas by comparing the token probability distributions between a fine-tuned persona model and a base model.

Repository Structure

runon/
├── scripts/
│   ├── train.py              # Training script for persona models
│   ├── evaluate.py           # Evaluation script supporting multiple metrics including LUNON
│   └── calculate_spearman_correlation.py  # Script to calculate correlation with human evaluations
├── src/
│   ├── model/               # Model loading and configuration
│   ├── training/            # Training utilities
│   ├── evaluation/          # Evaluation metrics
│   │   └── lunon_metric.py  # LUNON implementation
│   └── data/               # Data loading utilities
├── data/
│   └── evaluation/         # Evaluation datasets
│       ├── A/              # Persona A evaluation data
│       ├── B/              # Persona B evaluation data
│       ├── C/              # Persona C evaluation data
│       ├── D/              # Persona D evaluation data
│       └── E/              # Persona E evaluation data
├── README.md
└── LICENSE

Requirements

Python 3.8+
PyTorch
Transformers
Unsloth
Hydra
Pandas
NumPy
SciPy

Installation

pip install torch transformers unsloth hydra-core pandas numpy scipy

Usage

Training Persona Models

python scripts/train.py \
    --config-path ../configs \
    --config-name train \
    model.model_name=tokyotech-llm/Llama-3.1-Swallow-8B-v0.5 \
    data.train_file=path/to/train.jsonl \
    data.eval_file=path/to/eval.jsonl

Evaluating with LUNON

python scripts/evaluate.py \
    --config-path ../configs \
    --config-name evaluate \
    evaluation.metrics=[lunon] \
    evaluation.lunon.ft_model_name=path/to/finetuned/model \
    evaluation.test_file=data/evaluation/A/for_evaluation2_shuffled.jsonl

Calculating Correlation with Human Evaluations

python scripts/calculate_spearman_correlation.py \
    data/evaluation/A/human_evaluation_scores.csv \
    path/to/system_evaluation_results.csv

Data Format

Training Data

Training data should be in JSONL format with each line containing:

{"text": "Persona dialogue or monologue text"}

Note: Training data links will be made publicly available upon publication. Please check back for updates.

Evaluation Data

Evaluation data should be in JSONL format with:

{"prefix": "Context or prompt", "continuation": "Expected persona response"}

Human Evaluation Data

Human evaluation scores are provided in CSV format with columns:

respondent_id: Anonymized evaluator ID
question_num: Question/sample number
score: Likert scale score (1-5)
weighted_score: Score weighted by evaluator reliability

License

This project is licensed under the MIT License - see the LICENSE file for details.

Privacy and Ethics

All human evaluation data has been anonymized to protect participant privacy. Persona names have been replaced with letters (A-E) for confidentiality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LUNON (Log-likelihood Uniqueness using Normalized Offset to Non-personalized model)

Overview

Repository Structure

Requirements

Installation

Usage

Training Persona Models

Evaluating with LUNON

Calculating Correlation with Human Evaluations

Data Format

Training Data

Evaluation Data

Human Evaluation Data

License

Privacy and Ethics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data/evaluation		data/evaluation
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

LUNON (Log-likelihood Uniqueness using Normalized Offset to Non-personalized model)

Overview

Repository Structure

Requirements

Installation

Usage

Training Persona Models

Evaluating with LUNON

Calculating Correlation with Human Evaluations

Data Format

Training Data

Evaluation Data

Human Evaluation Data

License

Privacy and Ethics

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages