CTPN detects text regions (green boxes) and CRNN recognizes the text within each region:
A full PyTorch Lightning rewrite of courao/ocr.pytorch
Text detection (CTPN) and text recognition (CRNN) — converted end-to-end from vanilla PyTorch to PyTorch Lightning.
The original repo uses hand-rolled training loops with raw PyTorch. This fork rewrites both models and data pipelines as PyTorch Lightning modules:
| Component | Original (courao) | This fork |
|---|---|---|
| CTPN model | ctpn_model.py |
ctpn_model_PL.py — pl.LightningModule |
| CTPN data | inline in train script | ctpn_data_PL.py — pl.LightningDataModule |
| CTPN training | ctpn_train.py (manual loop) |
ctpn_train_PL.py — pl.Trainer |
| CRNN model | crnn.py |
crnn_model_PL.py — pl.LightningModule |
| CRNN data | mydataset.py |
crnn_data_PL.py — pl.LightningDataModule |
| CRNN training | train_pytorch_ctc.py (manual loop) |
crnn_train_PL.py — pl.Trainer |
This gives you automatic logging (TensorBoard), checkpointing, multi-GPU support, and cleaner separation of model/data/training logic — all out of the box.
Working on implementing CRAFT and Transformer_STR.
Pull requests welcome!
- python-3.6+
- pytorch-lightning-1.4.1
- opencv-4.5.2.52
- numpy-1.21.1
- Pillow-8.2.0
- pathed-1.1.00
Detection is based on CTPN, some codes are borrowed from pytorch_ctpn
Recognition is based on CRNN, some codes are borrowed from crnn.pytorch
Download pretrained models from Baidu Netdisk (extract code: u2ff) or Google Driver and put these files into checkpoints. Then run
python3 demo.py
The image files in ./test_images will be tested for text detection and recognition, the results will be stored in ./test_result.
If you want to test a single image, run
python3 test_one.py [filename]
Training codes are placed into train_code directory.
Train CTPN
Train CRNN
