CiteGuard

Academic paper hallucinated citation detection tool. Paste your paper fragments and reference entries, and CiteGuard will detect non-existent citations, metadata errors, and misrepresented claims — with correction suggestions.

Features

Multi-format reference parsing — BibTeX, numbered lists, APA, and GB/T 7714
4 academic database verification — CrossRef, Semantic Scholar, DBLP, OpenAlex (parallel queries)
Metadata validation — field-by-field comparison against authoritative records
Semantic relevance detection — optional LLM-powered analysis of whether citations support claims
Real-time progress — SSE streaming with live per-reference updates
Per-citation scoring — base 100 with deductions for issues found
Markdown export — one-click full detection report download
Multiple LLM providers — OpenAI, GLM (Zhipu), DeepSeek, Moonshot, MiniMax, or any OpenAI-compatible endpoint

Quick Start

Prerequisites

Python 3.12+
Node.js 18+
uv (Python package manager)
pnpm (Node.js package manager)

Install & Run

# Backend
uv sync                                          # install Python dependencies
uv run uvicorn backend.main:app --port 8000 --reload

# Frontend (in a separate terminal)
pnpm --dir frontend install                      # install Node dependencies
pnpm --dir frontend dev                          # starts on http://localhost:5173

Open http://localhost:5173 in your browser.

Run Tests

# Backend
uv run pytest

# Frontend
pnpm --dir frontend test

How It Works

Input

Paper fragment (optional) — paste paragraphs containing citation markers. Supported formats:
- Numeric: [1], [2,3], [1-3]
- Author-Year: (Smith, 2020), (Smith et al., 2020)
- LaTeX: \cite{key}, \cite{key1,key2}
References — paste reference entries and select the format from the dropdown:
- BibTeX
- Numbered list
- APA
- GB/T 7714 (Chinese national standard)

Detection Pipeline

Each reference passes through three phases independently:

Phase	Description	Always runs?
Existence verification	Queries 4 academic databases in parallel (CrossRef, Semantic Scholar, DBLP, OpenAlex). Any one confirming existence is sufficient.	Yes
Metadata validation	Compares title, authors, year, journal, DOI, volume, pages against API records. Uses string similarity first, LLM for uncertain cases.	If exists
Semantic relevance	LLM analyzes whether the citation actually supports the claim in the paper text.	If paper text provided & LLM configured

Scoring

Each citation starts at 100 points. Deductions:

Issue	Deduction
Literature does not exist	-30
Literature possibly exists	-15
Metadata field mismatch	-5 per field
Metadata possible error	-2 per field
Citation does not support claim	-10
Citation partially supports claim	-5

Status labels: Pass (>=80), Warning (50-79), Error (<50), Skipped

Output

Per-reference result cards with existence status, metadata comparison table, semantic analysis, and suggested BibTeX correction
Summary statistics across all references
One-click Markdown report export

LLM Configuration

Semantic relevance detection requires an LLM. Configure it in the Settings panel:

Provider	Default Base URL
OpenAI	`https://api.openai.com/v1`
GLM (Zhipu)	`https://open.bigmodel.cn/api/paas/v4`
DeepSeek	`https://api.deepseek.com/v1`
Moonshot	`https://api.moonshot.cn/v1`
MiniMax	`https://api.minimax.chat/v1`
Custom	User-provided URL

The API key is stored in memory only and is lost on server restart. No key is ever logged in plaintext.

API Reference

Endpoint	Method	Description
`/api/health`	GET	Health check
`/api/detect`	POST	Submit a detection task
`/api/detect/{task_id}/stream`	GET	SSE progress stream
`/api/detect/{task_id}/report`	GET	Get complete report
`/api/llm/config`	POST	Save LLM configuration
`/api/llm/test`	POST	Test LLM connection

Interactive API docs available at http://localhost:8000/docs (Swagger UI).

Project Structure

CiteGuard/
├── backend/                    # Python FastAPI backend
│   ├── api/                    # REST endpoints (detect, llm)
│   ├── models/                 # Pydantic schemas
│   ├── services/
│   │   ├── academic_apis/      # CrossRef, Semantic Scholar, DBLP, OpenAlex clients
│   │   ├── matchers/           # String similarity, LLM matcher, scoring, BibTeX
│   │   └── parsers/            # BibTeX, numbered, APA, GB/T 7714 parsers
│   ├── tests/                  # pytest test suite
│   ├── config.py               # Configuration constants
│   └── main.py                 # FastAPI application
├── frontend/                   # React + TypeScript frontend
│   └── src/
│       ├── components/         # React components
│       ├── hooks/              # Custom hooks (useSSE, useDetect)
│       ├── services/           # API client
│       └── types/              # TypeScript definitions
└── docs/                       # PRD, technical spec, implementation plan

Tech Stack

Layer	Technologies
Frontend	React 19, TypeScript, Vite, shadcn/ui, Tailwind CSS
Backend	Python 3.12+, FastAPI, httpx, Pydantic
Real-time	Server-Sent Events (SSE)
LLM	OpenAI-compatible API (via openai SDK)
Testing	pytest + pytest-asyncio (backend), Vitest (frontend)

Constraints

Maximum 50 references per detection
Single reference timeout: 60 seconds
Total task timeout: 10 minutes
All files under 800 lines, functions under 50 lines

License

This project is for educational and research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
backend		backend
docs		docs
frontend		frontend
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CiteGuard

Features

Quick Start

Prerequisites

Install & Run

Run Tests

How It Works

Input

Detection Pipeline

Scoring

Output

LLM Configuration

API Reference

Project Structure

Tech Stack

Constraints

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CiteGuard

Features

Quick Start

Prerequisites

Install & Run

Run Tests

How It Works

Input

Detection Pipeline

Scoring

Output

LLM Configuration

API Reference

Project Structure

Tech Stack

Constraints

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages