persona-ir — Persona-Driven Variance in IR Retrieval

Reference implementation and release artefacts for the paper "Persona-Driven Variance in IR Retrieval".

The repository measures how much a retriever's output depends on who is asking the same information need, across BM25 and four dense encoders (MiniLM-L6, MPNet-base, E5-base-v2, BGE-base-en-v1.5), on a 5,000-passage QReCC subset, under two persona-generation methodologies: a controlled activation-steering condition and an uncontrolled prompt condition.

Headline: under the controlled steered condition, BM25 returns a different top-1 document in 65% of (persona, need) pairs while MPNet does so in 26% — a 2.5× sparse-vs-dense gap with paired-permutation p = 0.008. The gap survives a random word-swap control, three BM25 configurations, a Qwen 2.5 7B simulator substitution, an 80-user PRISM real-user replication (1.30×), and per-axis decomposition.

Layout

persona-ir/
├── scripts/             # query generators, retrievers, evaluation, viewer build
├── results/             # JSON outputs of every experiment (phase 1–2 + E1–E6)
├── viewer/              # static HTML viewer of per-persona top-3 docs
├── figures/             # hero figure source PDFs/PNGs
├── requirements.txt
├── LICENSE              # MIT
└── README.md

What is released

160 triples: 80 steered + 80 prompt, 10 personas × 8 information needs.
- Prompt triples (with full query text): results/phase2a_prompt_baseline.json.
- Steered triples: regenerated by scripts/phase2a_prompt_baseline.py with --mode steered, or extracted from the paired-session JSONs (see Reproducing the steered condition).
Both query generators: prompt conditioning via scripts/phase2a_prompt_baseline.py, residual-stream activation steering via the recipe in the steered runner.
Evaluation code for all six experiments (E1–E6) and the two NDCG / coherence passes.
Static viewer: viewer/index.html displays each persona's query and the top-3 retrieved documents per retriever; no model runs in the browser.

Quick start

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Reproduce phase-2a prompt baseline (80 triples, 5 retrievers, 5k QReCC)
python3 scripts/phase2a_prompt_baseline.py \
    --qrecc data/raw/qrecc/qrecc_train.json \
    --n-passages 5000

# Run all robustness experiments
bash scripts/run_all_e1_e5.sh

Each runner writes its output to results/<experiment>.json and prints a one-line summary.

Reproducing the steered condition

Steered queries are generated by the script using Gemma 2 2B-it with a layer-12 residual-stream mean-difference vector. The vector is fit on a held-out corpus of axis-extreme utterances (jargon, impatience, query specificity, evidence-seeking, clarification tolerance) and applied with magnitude α = 1.5. The 80 steered queries used in the paper are also stored as persona_pos_text fields in the paired-session JSONs released alongside this repository.

Static viewer

viewer/index.html is a single static page that loads viewer/data.js and lets you click through each (persona, need) pair to see the query and the top-3 retrieved documents per retriever. No model runs in the browser; it's a tool for inspection.

Citation

@inproceedings{persona-ir,
  author    = {Anonymous},
  title     = {Persona-Driven Variance in {IR} Retrieval},
  booktitle = {Anonymous Submission},
  year      = {2026}
}

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

persona-ir — Persona-Driven Variance in IR Retrieval

Layout

What is released

Quick start

Reproducing the steered condition

Static viewer

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
figures		figures
results		results
scripts		scripts
viewer		viewer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

persona-ir — Persona-Driven Variance in IR Retrieval

Layout

What is released

Quick start

Reproducing the steered condition

Static viewer

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages