Skip to content

mohan-gupta/custom-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Receipts OCR


Performing OCR on Receipts.

In this project, I have implemented an OCR pipeline using YOLO for text detection and a custom CRNN model for text recognition.

Approach

DataSet

I have used the invoice and receipt dataset from Hugging Face. Dataset link

Model

For text detection, I have used YOLO v8 model and for text recognition, I have used CRNN model consisting of 7 layers CNN with BatchNorm and MaxPool.

Training

I have approached the text recognition problem at a character level. I have used CTC loss for training the text recognition model and AdamW optimizer with 1e-3 learning rate and 0.03 decay rate.

Result

  • For Text Detection, YOLO v8: mAP50 = 99.4% and mAP50-95 = 81.1%
  • For Text Recognition: CTC Train loss = 0.253, Valid Loss = 0.108 and Test Loss = 0.225

To run the project

git clone https://github.com/mohan-gupta/driving-scene-segmentation.git  # clone
cd driving-scene-segmentation

uv venv --python=3.12 # set-up uv python environment
.venv/Scripts/activate # activate the environment
uv pip install -r requirements.txt  # install
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 # Installing PyTorch with cuda

cd app
streamlit run streamlit_app.py  # for running the streamlit app
OR
uvicorn main:app # for running the fastapi webapp

About

Performing OCR on Receipts and Invoices using PyTorch and YOLO.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors