Skip to content

yuanxiaoye1031/CiteGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CiteGuard

中文文档

Academic paper hallucinated citation detection tool. Paste your paper fragments and reference entries, and CiteGuard will detect non-existent citations, metadata errors, and misrepresented claims — with correction suggestions.

Features

  • Multi-format reference parsing — BibTeX, numbered lists, APA, and GB/T 7714
  • 4 academic database verification — CrossRef, Semantic Scholar, DBLP, OpenAlex (parallel queries)
  • Metadata validation — field-by-field comparison against authoritative records
  • Semantic relevance detection — optional LLM-powered analysis of whether citations support claims
  • Real-time progress — SSE streaming with live per-reference updates
  • Per-citation scoring — base 100 with deductions for issues found
  • Markdown export — one-click full detection report download
  • Multiple LLM providers — OpenAI, GLM (Zhipu), DeepSeek, Moonshot, MiniMax, or any OpenAI-compatible endpoint

Quick Start

Prerequisites

  • Python 3.12+
  • Node.js 18+
  • uv (Python package manager)
  • pnpm (Node.js package manager)

Install & Run

# Backend
uv sync                                          # install Python dependencies
uv run uvicorn backend.main:app --port 8000 --reload

# Frontend (in a separate terminal)
pnpm --dir frontend install                      # install Node dependencies
pnpm --dir frontend dev                          # starts on http://localhost:5173

Open http://localhost:5173 in your browser.

Run Tests

# Backend
uv run pytest

# Frontend
pnpm --dir frontend test

How It Works

Input

  1. Paper fragment (optional) — paste paragraphs containing citation markers. Supported formats:

    • Numeric: [1], [2,3], [1-3]
    • Author-Year: (Smith, 2020), (Smith et al., 2020)
    • LaTeX: \cite{key}, \cite{key1,key2}
  2. References — paste reference entries and select the format from the dropdown:

    • BibTeX
    • Numbered list
    • APA
    • GB/T 7714 (Chinese national standard)

Detection Pipeline

Each reference passes through three phases independently:

Phase Description Always runs?
Existence verification Queries 4 academic databases in parallel (CrossRef, Semantic Scholar, DBLP, OpenAlex). Any one confirming existence is sufficient. Yes
Metadata validation Compares title, authors, year, journal, DOI, volume, pages against API records. Uses string similarity first, LLM for uncertain cases. If exists
Semantic relevance LLM analyzes whether the citation actually supports the claim in the paper text. If paper text provided & LLM configured

Scoring

Each citation starts at 100 points. Deductions:

Issue Deduction
Literature does not exist -30
Literature possibly exists -15
Metadata field mismatch -5 per field
Metadata possible error -2 per field
Citation does not support claim -10
Citation partially supports claim -5

Status labels: Pass (>=80), Warning (50-79), Error (<50), Skipped

Output

  • Per-reference result cards with existence status, metadata comparison table, semantic analysis, and suggested BibTeX correction
  • Summary statistics across all references
  • One-click Markdown report export

LLM Configuration

Semantic relevance detection requires an LLM. Configure it in the Settings panel:

Provider Default Base URL
OpenAI https://api.openai.com/v1
GLM (Zhipu) https://open.bigmodel.cn/api/paas/v4
DeepSeek https://api.deepseek.com/v1
Moonshot https://api.moonshot.cn/v1
MiniMax https://api.minimax.chat/v1
Custom User-provided URL

The API key is stored in memory only and is lost on server restart. No key is ever logged in plaintext.

API Reference

Endpoint Method Description
/api/health GET Health check
/api/detect POST Submit a detection task
/api/detect/{task_id}/stream GET SSE progress stream
/api/detect/{task_id}/report GET Get complete report
/api/llm/config POST Save LLM configuration
/api/llm/test POST Test LLM connection

Interactive API docs available at http://localhost:8000/docs (Swagger UI).

Project Structure

CiteGuard/
├── backend/                    # Python FastAPI backend
│   ├── api/                    # REST endpoints (detect, llm)
│   ├── models/                 # Pydantic schemas
│   ├── services/
│   │   ├── academic_apis/      # CrossRef, Semantic Scholar, DBLP, OpenAlex clients
│   │   ├── matchers/           # String similarity, LLM matcher, scoring, BibTeX
│   │   └── parsers/            # BibTeX, numbered, APA, GB/T 7714 parsers
│   ├── tests/                  # pytest test suite
│   ├── config.py               # Configuration constants
│   └── main.py                 # FastAPI application
├── frontend/                   # React + TypeScript frontend
│   └── src/
│       ├── components/         # React components
│       ├── hooks/              # Custom hooks (useSSE, useDetect)
│       ├── services/           # API client
│       └── types/              # TypeScript definitions
└── docs/                       # PRD, technical spec, implementation plan

Tech Stack

Layer Technologies
Frontend React 19, TypeScript, Vite, shadcn/ui, Tailwind CSS
Backend Python 3.12+, FastAPI, httpx, Pydantic
Real-time Server-Sent Events (SSE)
LLM OpenAI-compatible API (via openai SDK)
Testing pytest + pytest-asyncio (backend), Vitest (frontend)

Constraints

  • Maximum 50 references per detection
  • Single reference timeout: 60 seconds
  • Total task timeout: 10 minutes
  • All files under 800 lines, functions under 50 lines

License

This project is for educational and research purposes.

About

Academic paper hallucinated citation detection tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors