Skip to content

Savior344/OCR-FORGE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

OCR-FORGE

A mini webpage which helps you convert a non-searchable PDF archive to a searchable one.

OCR·FORGE is a fully self-contained HTML file that you open directly in your browser. No setup. No server. No uploads.

What it does

OCR·FORGE turns scanned or image-based PDFs into searchable PDFs with selectable text layered invisibly beneath the page image.

How the pipeline works

  1. PDF.js renders each page into a high-resolution <canvas> at 2× or 3× depending on the quality setting.
  2. Tesseract.js runs OCR on that canvas and returns every word with its exact bounding box.
  3. jsPDF builds the output PDF in a critical order:
    • first, it writes the OCR text in white so it stays invisible,
    • then it places the page image on top,
    • the text remains underneath the image, but it is still searchable and selectable in any PDF viewer.

Features

  • 9 available languages, including Spanish, English, Portuguese, and French
  • 3 render quality levels
  • Real-time per-page log with confidence percentages
  • Thumbnail previews
  • 100% local processing
  • Nothing is uploaded to any server
  • Word-level bounding boxes are scaled correctly from canvas pixels to PDF points

Tech stack

  • HTML5
  • CSS3
  • JavaScript
  • PDF.js
  • Tesseract.js
  • jsPDF

Privacy

Everything runs locally in the browser. Your files stay on your device during the whole process.

AI assistance

This project was created with help from Claude, using the Sonnet 4.6 Adaptative model with Tool Access set to Always Available.

The following prompts were used:

A continuación, genera el código HTML5, CSS3, y JavaScript.
De una página web que use Tesseract.js y PDF.js para convertir cualquier PDF que el usuario suba, en un PDF seleccionable
Continua

Use

  1. Open the HTML file in your browser.
  2. Upload a PDF.
  3. Choose the language and quality level.
  4. Start OCR processing.
  5. Download the searchable PDF.

Output behavior

The generated PDF keeps the original page appearance while adding a hidden text layer for search and selection support.

Notes

  • Best results come from clean scans and high-quality source PDFs.
  • Multi-page documents are processed page by page.
  • OCR confidence is shown in the live log so you can track recognition quality as it runs.

About

A mini webpage which helps you convert a non-searchable PDF archive to a searchable one.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages