🧠 Iterative Question Generation for Fact-Checking

This repository contains the implementation of the experiments from the paper:

"Less is More: Improved Fact-Checking on AVeriTeC with Iterative Question Generation"

The project investigates the isolated impact of iterative question generation in Retrieval-Augmented Generation (RAG) pipelines for fact-checking.

📌 Overview

This system follows a standard open-domain fact-checking pipeline:

Receive a claim
Generate verification questions
Use questions as search queries
Retrieve evidence from the web
Rank and aggregate evidence
Predict a final label:
- SUPPORTED
- REFUTED
- NOT ENOUGH EVIDENCE (NEI)
- CONFLICTING EVIDENCE/CHERRYPICKING

The key contribution is a controlled comparison between:

Single-step question generation
Iterative question generation

⚙️ Methods

🔹 Single-step

Generates up to 5 questions in a single pass
No access to intermediate evidence
All queries are independent

prompt = f"""
You are a fact-checking assistant.

Task: Generate up to {num_questions} questions to verify the claim.

Claim:
{context.claim}

Claim date:
{context.claim_date}

Speaker:
{context.speaker}

Instructions:
- Generate as many useful questions as needed (from 1 up to {num_questions})
- Do not force unnecessary questions
- Avoid redundancy
- Questions must be factual and verifiable
- If claim involves time, check chronology
- If claim involves attribution, verify speaker/source
- If claim involves numbers, verify quantities
- Do not provide explanations

Output:
Return a numbered list of questions only.
"""

🔹 Iterative

Generates one question at a time
Each question is conditioned on:
- previously generated questions
- previously retrieved evidence
Stops early if enough evidence is gathered

prompt = f"""
Task: Generate ONE new question to help verify the claim.

Claim:
{context.claim}

Claim date:
{context.claim_date}

Speaker:
{context.speaker}

---

Previous questions:
{previous_questions}

---

Known evidence (partial):
{evidence_snippets}

---

Current stance signals:
{stance_summary}

---

Instructions:
- Generate ONLY ONE question
- Do NOT repeat previous questions
- Focus on missing or uncertain aspects
- Do not force unnecessary questions
- Avoid redundancy
- Questions must be factual and verifiable
- If claim involves time, check chronology
- If claim involves attribution, verify speaker/source
- If claim involves numbers, verify quantities
- Do not provide explanations

Output:
Provide only the question.
"""

👉 The only difference between methods is how questions are generated

🚀 Setup

1. Clone the repository

2. Create a virtual environment

python -m venv venv
source venv/bin/activate   # Linux / Mac
venv\Scripts\activate      # Windows

3. Install dependencies

pip install -r requirements.txt

⚙️ Environment Configuration (.env)

Create a .env file in the root directory:

# ===== LLM =====
OLLAMA_MODEL=llama3.1:8b # Local model served via Ollama
LLM_TEMPERATURE=0.0      # Set to `0.0` for deterministic outputs
LANGUAGE=en              # Language used in prompts (default: `en`)

# ===== QUESTION GENERATION =====
MAX_QUESTIONS=5          # Maximum number of generated questions (N = 5)

# ===== SEARCH ENGINE =====
SEARCH_ENGINE=duckduckgo
SEARCH_MAX_RESULTS=10   # Number of results per query
SEARCH_MAX_URLS=30      # Maximum URLs retrieved per query
SEARCH_MAX_WORKERS=10   # Parallel workers for retrieval

# ===== RETRIEVAL =====
BM25_TOP_K=10           # Number of documents selected via BM25
USE_QUESTION_FOR_RETRIEVAL=true

# ===== RERANKING =====
RERANKER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2   # Cross-encoder model for reranking
RERANKER_TOP_K=5        # Final number of selected passages
RERANKER_THRESHOLD=0.3  # Score threshold for filtering
USE_RERANKER=true       # Enable/disable reranking stage

# ===== PASSAGES =====
CHUNK_SIZE=300          # Number of tokens per passage chunk

▶️ Running Experiments

🔹 Single-step

python experiments/run_averitec_original.py

🔹 Iterative

python experiments/run_averitec_iterative.py

📊 Outputs

Results are saved in:

outputs/

Includes:

Predictions per claim
Aggregated metrics:
- Accuracy
- Macro-F1
- F1 per class
- Number of queries
- Token usage

📈 Key Findings

Iterative generation:
- Improves accuracy (+1.7 to +17.5)
- Reduces number of queries (~50%)
- Reduces abstention (NEI)
- Increases token usage (7–17%)

👉 Key insight: Iterative generation shifts computation from retrieval to reasoning

🧪 Dataset

AVeriTeC (development set)
500 real-world claims
Open-domain (web retrieval)

Labels:

SUPPORTED
REFUTED
NOT ENOUGH EVIDENCE
CONFLICTING EVIDENCE/CHERRYPICKING

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
datasets/averitec		datasets/averitec
experiments		experiments
modules		modules
notebooks		notebooks
pipeline		pipeline
utils		utils
.gitignore		.gitignore
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
results.ipynb		results.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Iterative Question Generation for Fact-Checking

📌 Overview

⚙️ Methods

🔹 Single-step

🔹 Iterative

🚀 Setup

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

⚙️ Environment Configuration (.env)

▶️ Running Experiments

🔹 Single-step

🔹 Iterative

📊 Outputs

📈 Key Findings

🧪 Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Iterative Question Generation for Fact-Checking

📌 Overview

⚙️ Methods

🔹 Single-step

🔹 Iterative

🚀 Setup

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

⚙️ Environment Configuration (.env)

▶️ Running Experiments

🔹 Single-step

🔹 Iterative

📊 Outputs

📈 Key Findings

🧪 Dataset

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages