Reference implementation and release artefacts for the paper "Persona-Driven Variance in IR Retrieval".
The repository measures how much a retriever's output depends on who is asking the same information need, across BM25 and four dense encoders (MiniLM-L6, MPNet-base, E5-base-v2, BGE-base-en-v1.5), on a 5,000-passage QReCC subset, under two persona-generation methodologies: a controlled activation-steering condition and an uncontrolled prompt condition.
Headline: under the controlled steered condition, BM25 returns a different top-1 document in 65% of (persona, need) pairs while MPNet does so in 26% — a 2.5× sparse-vs-dense gap with paired-permutation p = 0.008. The gap survives a random word-swap control, three BM25 configurations, a Qwen 2.5 7B simulator substitution, an 80-user PRISM real-user replication (1.30×), and per-axis decomposition.
persona-ir/
├── scripts/ # query generators, retrievers, evaluation, viewer build
├── results/ # JSON outputs of every experiment (phase 1–2 + E1–E6)
├── viewer/ # static HTML viewer of per-persona top-3 docs
├── figures/ # hero figure source PDFs/PNGs
├── requirements.txt
├── LICENSE # MIT
└── README.md
- 160 triples: 80 steered + 80 prompt, 10 personas × 8 information needs.
- Prompt triples (with full query text):
results/phase2a_prompt_baseline.json. - Steered triples: regenerated by
scripts/phase2a_prompt_baseline.pywith--mode steered, or extracted from the paired-session JSONs (see Reproducing the steered condition).
- Prompt triples (with full query text):
- Both query generators: prompt conditioning via
scripts/phase2a_prompt_baseline.py, residual-stream activation steering via the recipe in the steered runner. - Evaluation code for all six experiments (E1–E6) and the two NDCG / coherence passes.
- Static viewer:
viewer/index.htmldisplays each persona's query and the top-3 retrieved documents per retriever; no model runs in the browser.
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Reproduce phase-2a prompt baseline (80 triples, 5 retrievers, 5k QReCC)
python3 scripts/phase2a_prompt_baseline.py \
--qrecc data/raw/qrecc/qrecc_train.json \
--n-passages 5000
# Run all robustness experiments
bash scripts/run_all_e1_e5.shEach runner writes its output to results/<experiment>.json and prints a one-line summary.
Steered queries are generated by the script using Gemma 2 2B-it with a layer-12 residual-stream mean-difference vector. The vector is fit on a held-out corpus of axis-extreme utterances (jargon, impatience, query specificity, evidence-seeking, clarification tolerance) and applied with magnitude α = 1.5. The 80 steered queries used in the paper are also stored as persona_pos_text fields in the paired-session JSONs released alongside this repository.
viewer/index.html is a single static page that loads viewer/data.js and lets you click through each (persona, need) pair to see the query and the top-3 retrieved documents per retriever. No model runs in the browser; it's a tool for inspection.
@inproceedings{persona-ir,
author = {Anonymous},
title = {Persona-Driven Variance in {IR} Retrieval},
booktitle = {Anonymous Submission},
year = {2026}
}MIT — see LICENSE.