A web-based EPUB optimizer for e-ink readers. Drop in any EPUB and get back a clean, optimized file ready for your device.
Originally built for the Xteink X4 (800x480 e-ink display, 4-level grayscale, SSD1677 controller, ESP32-C3) but works with any Xteink reader or e-ink device that supports EPUB.
epubkit runs a 20-step pipeline on every EPUB:
| Step | What it does |
|---|---|
| 1 | DRM check — detects DRM-protected files and stops early with a clear message |
| 2 | Extract — unpacks the EPUB ZIP structure into a working directory |
| 3 | Parse structure — locates the OPF package file and parses the manifest |
| 4 | Read metadata — extracts title, author, series, language, cover reference |
| 5 | Apply metadata edits — overwrites title/author if the user edited them in the UI |
| 6 | Find content files — catalogs all XHTML, CSS, image, and font files in the EPUB |
| 7 | Process images — converts all images to baseline JPEG, resizes to 800x480 (max 1024x1024), applies 4-level grayscale quantization with Floyd-Steinberg dithering, autocontrast histogram stretching, and contrast boost. Light Novel mode rotates/splits landscape images |
| 8 | Fix SVG covers — unwraps SVG-wrapped cover images (common in Gutenberg/store EPUBs) |
| 9 | Generate cover — creates a title/author cover image if the book doesn't have one |
| 10 | Update references — rewrites all internal hrefs and srcs to match renamed image files |
| 11 | Repair HTML + strip attributes — fixes malformed XHTML with lxml recovery parser, strips unnecessary attributes (data-*, aria-*, role, tabindex, etc.) to reduce parsing overhead for the 380KB RAM device |
| 12 | Remove unused CSS — collects all used classes/IDs/elements across XHTML files, then strips CSS rules that don't match anything |
| 13 | Remove embedded fonts — deletes @font-face rules from CSS, removes font files (.ttf, .otf, .woff, .woff2), and cleans them from the OPF manifest |
| 14 | Normalize whitespace — strips excessive empty paragraphs/divs, adds CSS page-break-before to chapter headings (h1, h2) |
| 15 | Text cleanup — scans all text nodes (skipping script/style/pre/code) and fixes: double spaces, OCR ligature artifacts (fi/fl/ffi/ffl/ff), smart quotes → straight quotes, mojibake encoding errors, punctuation issues, Unicode NFC normalization |
| 16 | Clean metadata — strips store-specific tags (Calibre, iBooks, Kindle, Amazon, Google Play, Kobo) |
| 17 | Fix TOC — validates the Table of Contents, generates one from chapter headings if missing |
| 18 | Clean OS artifacts — removes .DS_Store, Thumbs.db, __MACOSX, desktop.ini, etc. |
| 19 | Repackage — rebuilds the EPUB ZIP with correct mimetype entry and deflate compression |
| 20 | Output filename — generates a clean Author - Title.epub filename from metadata |
- Drop one or more EPUB files onto the upload zone
- Edit title/author if needed (auto-detected from metadata)
- Pick a preset: Quick (images + text), Full (X4-optimized), or Custom
- Click Optimize and watch real-time progress via SSE streaming
- Download the optimized EPUB — ready to transfer to your reader
| Preset | Images | Text | Fonts | CSS | Cover | Metadata | Best for |
|---|---|---|---|---|---|---|---|
| Quick | Yes | Yes | No | No | No | No | Fast image + text pass |
| Full | Yes | Yes | Yes | Yes | Yes | Yes | Complete X4 optimization |
| Custom | Pick | Pick | Pick | Pick | Pick | Pick | Fine-grained control |
The optimizer is tuned for these hardware constraints:
| Spec | Value |
|---|---|
| Display | 800x480 e-ink panel |
| Grayscale | 4 levels (SSD1677 controller): black, dark gray, light gray, white |
| Processor | ESP32-C3, 160MHz |
| RAM | 380KB usable |
| Max image | 1024x1024 pixels |
| Formats | EPUB, XTC, XTCH, Markdown, TXT |
| Storage | 32GB + microSD |
- Format: All images converted to baseline JPEG (progressive breaks many e-ink readers)
- Resize: Fit within 800x480 screen, hard clamp at 1024x1024
- Grayscale: 4-level quantization matching SSD1677 palette (0, 85, 170, 255) with Floyd-Steinberg dithering
- Contrast: Auto-histogram stretching (
ImageOps.autocontrast) followed by 1.5x contrast boost - Subsampling: 4:2:0 for grayscale (all RGB channels identical, saves ~15-20%), 4:4:4 for color
- Transparency: Alpha composited onto white background
- Light Novel mode: Landscape images rotated 90°; double-page spreads (aspect > 1.8) split into two portrait pages
Scans all XHTML text nodes (skipping <script>, <style>, <pre>, <code>):
- Whitespace: Multiple spaces/tabs → single space, removes spaces before punctuation
- OCR ligatures: fi (U+FB01), fl (U+FB02), ffi (U+FB03), ffl (U+FB04), ff (U+FB00) → plain ASCII
- Smart quotes: Typographic quotes/dashes → straight equivalents (configurable)
- Mojibake: Detects and repairs common UTF-8/Latin-1 double-encoding patterns
- Punctuation: 4+ dots → ellipsis, missing space after sentence-ending punctuation, duplicate commas
- Unicode: NFC normalization
- FastAPI — async web framework
- Pillow — image processing (4-level quantization, autocontrast)
- lxml — XML/HTML parsing and repair
- cssutils — CSS parsing and cleanup
- Server-Sent Events — real-time progress streaming
epubkit cannot process DRM-protected EPUBs. It will detect DRM and let you know. You'll need to remove DRM first using tools like DeDRM with Calibre.
Inspired by and built on ideas from:
- zgredex/baseline_jpg_converter — Calibre plugin for baseline JPEG conversion
- CrossPoint Reader PR #1224 — in-browser EPUB converter with Light Novel mode
- kxrz/calibre_workflow — Calibre plugin for HTML repair and CSS cleanup
- bigbag/papyrix-reader — Xteink device specifications and documentation
Built by @b1rdmania. Made because existing tools required too many steps — Calibre plugins, CLI scripts, manual image conversion. epubkit does it all in one pass through a simple web interface.
MIT