Claude reads your code well but doesn't look at the rendered PDF. This skill makes it look.
You wrote the LaTeX. You ran pandoc. You hit "Save as PDF". The source is right. The PDF has SmartArt clipping off the page, a heading orphaned on page 7, and Lorem ipsum still on page 3 because the template variable never got filled in.
Claude can read your .tex source. Claude cannot see the rendered page. So Claude says "looks good" while the PDF looks broken.
doc-eye closes that gap. It renders every page of a Word, PowerPoint, or PDF document, feeds each page to Claude's vision model with a structured rubric, and emits a per-page report of formatting, typography, colour, content, layout, and visual-coherence defects.
It catches what a meticulous human reviewer would catch in a 5-second skim per page — except across 50 pages, in 30 seconds, for 10 cents.
- FORMATTING — margins, indents, alignment, line spacing, broken page breaks, missing/duplicate headers
- TYPOGRAPHY — font mixing, inconsistent sizes, orphan/widow lines, hyphenation
- COLOUR — accidental coloured runs, low-contrast text, palette drift
- CONTENT — typos, broken cross-references, leftover
TODO/Lorem ipsumplaceholders - LAYOUT — SmartArt overflow, blank pages, tables overflowing margins, disconnected captions
- VISUAL_COHERENCE — mixed numbering schemes, inconsistent heading styles, footnote/anchor mismatches
It does NOT:
- Review code or source — that's
/code-review - Check clinical / scientific correctness — that's
scientific-manuscript-reviewer - OCR scanned PDFs (it can still audit layout on them, but won't proofread the scanned text)
- Propose fixes — reports only; you decide what to do
git clone https://github.com/pedroreisper/doc-eye
cd doc-eye
less install.sh # 80 lines, no obfuscation
bash install.shOne-liner if you trust the source:
curl -fsSL https://raw.githubusercontent.com/pedroreisper/doc-eye/main/install.sh | bashSet your API key (the skill calls the Anthropic API directly):
export ANTHROPIC_API_KEY=sk-ant-...Verify:
bash ~/.claude/skills/doc-eye/scripts/doctor.shProject-scoped (team-shared via your repo):
bash install.sh --project- Python 3.10+
pymupdfandanthropicPython packages (auto-installed)soffice(LibreOffice) for.docx/.pptxrendering:brew install --cask libreofficeon macOS,apt install libreofficeon Debian/UbuntuANTHROPIC_API_KEYenv var- macOS or Linux
The doctor.sh self-diagnostic confirms each of these.
# basic
bash scripts/audit.sh path/to/document.pdf
bash scripts/audit.sh path/to/manual.docx
# use Opus instead of default Haiku (more expensive, better for nuance)
bash scripts/audit.sh manual.docx --opus
# audit only a subset of pages
bash scripts/audit.sh long.pdf --max-pages 10
bash scripts/audit.sh long.pdf --sample # first 10 + last 5 + every 5th in between
# restrict to specific defect categories
bash scripts/audit.sh manual.docx --only LAYOUT,TYPOGRAPHY
# JSON output (for piping into other tools)
bash scripts/audit.sh manual.docx --format json | jq '.summary_counts'
# keep the rendered PNGs for inspection
bash scripts/audit.sh manual.docx --keep-rendersThe audit writes .doc-eye/<basename>/audit-report.json next to your cwd.
| Model | Per page | 50-page doc | Best for |
|---|---|---|---|
claude-haiku-4-5 (default) |
~$0.006 | ~$0.30 | Typography, layout, typos |
claude-sonnet-4-6 |
~$0.018 | ~$0.90 | + nuanced visual coherence |
claude-opus-4-7 (--opus) |
~$0.029 | ~$1.45 | + argument-level visual analysis |
Haiku is usually enough. Reserve Opus for documents where the cost of a missed defect is high (regulatory submissions, journal manuscripts).
audit.sh prints an estimated cost before the audit phase. Estimates over $1 require either DOC_EYE_NO_CONFIRM=1 or an interactive y confirmation — protects against accidental --opus on huge docs.
A free placeholder pre-pass (PyMuPDF text extraction, no API cost) runs before the vision phase and catches the cheapest, highest-frequency defect class: leftover TODO, CITATION, Lorem ipsum, {{template_var}}, [insert], FALAR DE, SINTETIZAR, etc. Calibrated against real failure patterns.
DOC-EYE — visual audit report
────────────────────────────────────────────────────────────
Source: exports/Manual_NC_v4.docx
Model: claude-haiku-4-5
Pages: 42 audited / 42 rendered
Findings: ❌ 3 critical ⚠️ 11 warning • 4 info
Token usage: 201840 in / 4218 out
DOCUMENT-LEVEL FINDINGS (2)
────────────────────────────────────────────────────────────
❌ [LAYOUT] pp. 7, 12, 19
Diagram box text clips at the right edge of the page (consistent SmartArt overflow).
location: top-right quadrant
⚠️ [TYPOGRAPHY] pp. 3, 5, 11, 14, 22, 30 (6 occurrences)
Body text mixes Calibri 11pt with Liberation Serif 11pt across paragraphs.
PER-PAGE FINDINGS (5)
────────────────────────────────────────────────────────────
Page 8:
❌ [CONTENT] Placeholder text "TODO: insert reference" left in body.
text: "TODO: insert reference here"
Page 23:
⚠️ [TYPOGRAPHY] Widow line: last line of paragraph alone at bottom of page.
text: "...portanto a conclusão é"
...
────────────────────────────────────────────────────────────
VERDICT: FINDINGS_PRESENT
If you use did-it-actually for request-fidelity audits, the result of a doc-eye audit can be a must_review criterion in your contract — see references/did-it-actually-bridge.md.
- SmartArt fidelity — LibreOffice headless is not byte-identical to Word for SmartArt. If your doc has lots of SmartArt, prefer to export the PDF from Word directly (File → Save As → PDF) and feed that PDF.
- Not OCR-grade — Claude vision reads text from images well but isn't a substitute for spell-check on long body text. Use a text-based spell-checker alongside.
- One page at a time — no facing-page-spread awareness in v1.
- Stateless across pages — each page is audited independently. Multi-page patterns are caught by post-hoc fingerprint deduplication, not by passing context between calls.
See references/edge-cases.md for the full taxonomy.
doc-eye/
├── SKILL.md # entry point
├── README.md
├── LICENSE
├── install.sh
├── references/
│ ├── audit-categories.md # full defect taxonomy
│ ├── vision-prompt-template.md # per-page prompt + tool schema
│ ├── output-schema.json # JSON Schema for audit-report.json
│ ├── edge-cases.md # encrypted PDFs, scans, RTL, monorepos
│ └── did-it-actually-bridge.md # composability with did-it-actually
└── scripts/
├── audit.sh # main driver
├── render.py # DOCX/PPTX → PDF → PNG via PyMuPDF
├── vision_audit.py # per-page vision calls (forced tool use)
├── aggregate.py # dedup + scoring → audit-report.json
├── report.py # JSON → prose
└── doctor.sh # self-diagnostic
Two things especially valuable:
- Fixtures — add a
.docx,.pptxor.pdfwith a known defect toexamples/plus anexpected.mdfinding note. Seeexamples/README.mdfor the layout. Helps regression-test prompt changes. - New categories or sub-categories — propose via issue. Edits to the category set are a breaking change to the report schema (bump version).
MIT.
Designed via parallel-think-tank simulation (research + architecture + technical feasibility) following the same methodology as did-it-actually. The unifying insight across both skills: Claude can't see what it's producing without an explicit perceptual loop.