🛡️ TruthLens — Real-Time Fake News Detection System

Diploma Thesis — Automated system for real-time detection, analysis, and correction of misinformation using a triple-signal AI pipeline: Statistical · Fact-Check · Manipulation.

Overview

TruthLens is an end-to-end misinformation detection platform that combines fine-tuned transformer models, multi-branch Retrieval-Augmented Generation (RAG), emotion-based manipulation detection, and LLM-powered reasoning to evaluate claims in real time.

The system produces a structured verdict with:

A confidence-scored label from a 7-point spectrum (Verified → False/Fabricated)
Source citations from Wikipedia and DuckDuckGo
A manipulation risk score (LOW / MEDIUM / HIGH) based on subjectivity and linguistic aggression analysis
An AI-generated counter-narrative grounded in retrieved evidence

Accessible through three interfaces: a Flask web dashboard, a Telegram bot, and a Chrome extension (TruthLens).

System Architecture

                     Input Claim
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Signal 1 — Statistical  (RoBERTa-base, fine-tuned)         │
│  Fine-tuned transformer binary classifier                   │
│  Output: FAKE / REAL  +  confidence score                   │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Signal 2 — Fact-Check  (Multi-Branch RAG)                  │
│                                                             │
│  Branch A: Wikipedia keyword search                         │
│  Branch B: DuckDuckGo — Adaptive Multi-Intent Query         │
│  Branch C: DuckDuckGo — Refutation Search*                  │
│  * auto-activated for health / scientific / institutional   │
│    claims                                                   │
│                                                             │
│  Keyword Extraction → Institution Detection → Retrieval     │
│  → Hard Relevance Gate → Ranked Snippets + Source URLs      │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Signal 3 — Manipulation  (SentimentAnalyzer)               │
│  Model: SamLowe/roberta-base-go_emotions (28 labels)        │
│                                                             │
│  • Weighted subjectivity score across all 28 emotions       │
│  • Linguistic aggression floor (ALL-CAPS / exclamation !!)  │
│  • Output: manipulation_risk  LOW / MEDIUM / HIGH           │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Reasoning & Counter-Narrative  (Mistral 7B via Ollama)     │
│  Fallback: First Principles Engine (rule-based, no LLM)     │
│                                                             │
│  Input:  claim + all 3 signals + retrieved context          │
│  Output: spectrum label + explanation + counter-narrative   │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
                  Final Verdict
          (7-point spectrum — see below)

Verdict Spectrum

The system classifies claims across a 7-point spectrum rather than a binary fake/real label:

Verdict	Emoji	Meaning
`SUPPORTED`	✅	Supported by credible evidence
`PARTIALLY_SUPPORTED`	⚠️	Core facts are real, but details differ
`UNSUPPORTED`	⚪	No strong evidence to confirm or deny
`MISLEADING_FRAMING`	🟠	Facts are twisted or presented out of context
`CONTRADICTED`	❌	Actively refuted by available evidence
`FALSE_FABRICATED`	🔴	Entirely fabricated — contradicts documented reality
`UNKNOWN`	❓	Insufficient data to reach a verdict

Features

Triple-signal pipeline — statistical, fact-check, and manipulation signals synthesized by Mistral 7B into a final spectrum verdict
Fine-tuned RoBERTa-base — selected after comparative evaluation against FinBERT, achieving 81.8% accuracy
Multi-branch RAG — Wikipedia + DuckDuckGo intent search + active refutation branch (auto-activated for health/scientific/institutional claims)
Adaptive Multi-Intent Retrieval — Institution Registry maps 30+ organizations (WHO, NASA, NATO, Harvard, EU bodies…) to authoritative domains for targeted site:<domain> queries
Political Appointment Branch — force-fetches precise Wikipedia sections for claims involving governmental roles or dates
Emotion-aware manipulation detection — GoEmotions (28-label) with custom weighted subjectivity scoring + linguistic aggression floor (catches ALL-CAPS / exclamation cues that tokenizers normalize away)
Mistral 7B reasoning — synthesizes all signals into a structured spectrum label, explanation, and counter-narrative
First Principles Fallback — rule-based engine that handles known pseudoscientific patterns (graphene nanobots, mRNA shedding, vaccine microchips…) when Mistral/Ollama is unavailable
Three delivery interfaces — Web Dashboard · Telegram Bot · Chrome Extension

Demo

Web Dashboard

TruthLens Chrome Extension

Contextual verification and source mapping directly in the browser.

Telegram Bot

From debunking complex misinformation to verifying official news.

Tech Stack

Layer	Technology
Statistical Model	RoBERTa-base (fine-tuned)
Manipulation Model	`SamLowe/roberta-base-go_emotions` + custom subjectivity weighting
LLM Reasoning	Mistral 7B (via Ollama)
RAG / Retrieval	Wikipedia REST API · DuckDuckGo Search (3-branch)
Training Framework	PyTorch · HuggingFace Transformers
Backend API	Flask (REST)
Telegram Bot	python-telegram-bot
Chrome Extension	Manifest V3 · Vanilla JS
Data Processing	Pandas · NumPy · scikit-learn

Installation

Prerequisites

Python 3.10+
Ollama with mistral pulled
CUDA GPU recommended for training; CPU is sufficient for inference

1. Clone

git clone https://github.com/sonamansuryan/FakeNewsDetectionSystems.git
cd FakeNewsDetectionSystems

2. Virtual environment

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Environment variables

cp .env.example .env
# Add TELEGRAM_BOT_TOKEN and any other required keys

5. Pull Mistral

ollama pull mistral

Usage

Web Dashboard

python -m src.api.app
# → http://localhost:5000

Telegram Bot

python telegram_bot.py

Full pipeline runner

python run_pipeline.py --claim "Your claim here"

Chrome Extension

Open chrome://extensions/
Enable Developer mode
Click Load unpacked → select TruthLens_Extension/
Highlight any text on a webpage → TruthLens icon → instant analysis

Pipeline Deep Dive

Signal 1 — Statistical (RoBERTa-base)

Both RoBERTa-base and FinBERT were fine-tuned and evaluated comparatively. RoBERTa-base was selected as the production model after achieving 81.8% accuracy vs FinBERT's 71.2%. It outputs a binary FAKE / REAL prediction with a calibrated confidence score.

Signal 2 — Fact-Check (Multi-Branch RAG)

rag_retriever.py runs up to three retrieval branches per claim:

Branch	Trigger	Query Strategy
Wikipedia	Always	Keyword-based search
DuckDuckGo (Intent)	Always	`site:<domain>` if institution detected; else `"official statement OR scientific consensus"`
DuckDuckGo (Refutation)	Health / scientific / institutional claims	`"<keywords> debunked OR scientific consensus OR evidence against"`

The Institution Registry covers 30+ organizations — health bodies (WHO, CDC, FDA, EMA), space agencies (NASA, ESA), universities (Oxford, Harvard, MIT), intergovernmental bodies (UN, NATO, IMF), EU institutions, and UK Parliament — enabling targeted authoritative retrieval. A Political Appointment Branch force-fetches precise Wikipedia sections when a claim involves governmental roles or dates.

Signal 3 — Manipulation (SentimentAnalyzer)

Uses SamLowe/roberta-base-go_emotions (RoBERTa-base fine-tuned on Google GoEmotions, 58k Reddit comments, 28 emotion labels). The full softmax distribution — not just the argmax — is used to compute a continuous subjectivity score:

subjectivity_score = Σ (p_i × W_i)

Each of the 28 emotions has a pre-assigned weight from 0.05 (neutral) to 0.95 (anger). A Linguistic Aggression Floor then checks raw text for ALL-CAPS words (≥ 4 chars) and exclamation marks, which tokenizers normalize away, and raises — but never lowers — the risk tier.

Reasoning — Mistral 7B + First Principles Fallback

Mistral 7B (Ollama) receives the claim, all retrieved snippets, and both signal results, then outputs a structured spectrum label, step-by-step explanation, and a grounded counter-narrative. When Ollama is unavailable, the First Principles Engine applies regex-based pattern matching for known pseudoscientific categories and returns pre-written, scientifically sourced counter-narratives without requiring any LLM call.

Models & Performance

Model Selection: RoBERTa vs FinBERT

Both models were fine-tuned on the same dataset for direct comparison:

Model	Base	Accuracy	F1 Score	Epochs	Status
RoBERTa	`roberta-base`	81.8%	81.8%	4	✅ Production model
FinBERT	`ProsusAI/finbert`	71.2%	71.2%	8	🔬 Trained, superseded by RoBERTa

RoBERTa-base outperformed FinBERT by a significant margin and was selected as the sole statistical signal in the final pipeline. FinBERT's lower performance is expected — it was originally pre-trained on financial text, making it less suited for general fake news classification.

Model checkpoints hosted on HuggingFace Hub (links TBD).

Manipulation Model

Model	Purpose	Training Data
`SamLowe/roberta-base-go_emotions`	28-label emotion classification → weighted subjectivity score	Google GoEmotions — 58k Reddit comments

Interfaces

🌐 Web Dashboard

Three-panel layout: Statistical signal (RoBERTa confidence gauge) · Fact-Check signal (ranked sources with trust scores) · Manipulation signal (subjectivity bar + risk badge). Full Mistral reasoning trace (Claim Understanding → Evidence Evaluation → Signal Synthesis → Conclusion) and counter-narrative displayed below.

🤖 Telegram Bot

Send any text claim. Returns a MarkdownV2-formatted message with verdict emoji, confidence score, manipulation risk, counter-narrative, and clickable source links. Uses the same run_pipeline() function as the web API — identical analysis, different presentation layer.

🔍 TruthLens Chrome Extension

Manifest V3 extension with a content script that intercepts selected text. Side panel shows: verdict badge, RoBERTa confidence bar, manipulation risk indicator, AI reasoning summary, source links, and a Copy Report / New Check button pair.

Roadmap

License

MIT License — see LICENSE for details.

_{Built as a diploma thesis · 2025–2026}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
TruthLens_Extension		TruthLens_Extension
assets/screenshots		assets/screenshots
configs		configs
reports		reports
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
check_dataset.py		check_dataset.py
combine_datasets.py		combine_datasets.py
ensemble_predictor.py		ensemble_predictor.py
final_pipeline_test.py		final_pipeline_test.py
fix_convert.py		fix_convert.py
main.py		main.py
requirements.txt		requirements.txt
restore_data.py		restore_data.py
run_pipeline.py		run_pipeline.py
setup.py		setup.py
telegram_bot.py		telegram_bot.py
test_rag.py		test_rag.py

Folders and files

Latest commit

History

Repository files navigation

🛡️ TruthLens — Real-Time Fake News Detection System

Overview

System Architecture

Verdict Spectrum

Features

Demo

Web Dashboard

TruthLens Chrome Extension

Telegram Bot

Tech Stack

Installation

Prerequisites

1. Clone

2. Virtual environment

3. Install dependencies

4. Environment variables

5. Pull Mistral

Usage

Web Dashboard

Telegram Bot

Full pipeline runner

Chrome Extension

Pipeline Deep Dive

Signal 1 — Statistical (RoBERTa-base)

Signal 2 — Fact-Check (Multi-Branch RAG)

Signal 3 — Manipulation (SentimentAnalyzer)

Reasoning — Mistral 7B + First Principles Fallback

Models & Performance

Model Selection: RoBERTa vs FinBERT

Manipulation Model

Interfaces

🌐 Web Dashboard

🤖 Telegram Bot

🔍 TruthLens Chrome Extension

Roadmap

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages