A mini webpage which helps you convert a non-searchable PDF archive to a searchable one.
OCR·FORGE is a fully self-contained HTML file that you open directly in your browser. No setup. No server. No uploads.
OCR·FORGE turns scanned or image-based PDFs into searchable PDFs with selectable text layered invisibly beneath the page image.
- PDF.js renders each page into a high-resolution
<canvas>at 2× or 3× depending on the quality setting. - Tesseract.js runs OCR on that canvas and returns every word with its exact bounding box.
- jsPDF builds the output PDF in a critical order:
- first, it writes the OCR text in white so it stays invisible,
- then it places the page image on top,
- the text remains underneath the image, but it is still searchable and selectable in any PDF viewer.
- 9 available languages, including Spanish, English, Portuguese, and French
- 3 render quality levels
- Real-time per-page log with confidence percentages
- Thumbnail previews
- 100% local processing
- Nothing is uploaded to any server
- Word-level bounding boxes are scaled correctly from canvas pixels to PDF points
- HTML5
- CSS3
- JavaScript
- PDF.js
- Tesseract.js
- jsPDF
Everything runs locally in the browser. Your files stay on your device during the whole process.
This project was created with help from Claude, using the Sonnet 4.6 Adaptative model with Tool Access set to Always Available.
The following prompts were used:
A continuación, genera el código HTML5, CSS3, y JavaScript.
De una página web que use Tesseract.js y PDF.js para convertir cualquier PDF que el usuario suba, en un PDF seleccionable
Continua
- Open the HTML file in your browser.
- Upload a PDF.
- Choose the language and quality level.
- Start OCR processing.
- Download the searchable PDF.
The generated PDF keeps the original page appearance while adding a hidden text layer for search and selection support.
- Best results come from clean scans and high-quality source PDFs.
- Multi-page documents are processed page by page.
- OCR confidence is shown in the live log so you can track recognition quality as it runs.