Skip to content

balaboom123/STRLite

Repository files navigation

STRLite: MAE-Pretrained Scene Text Recognition

STRLite banner

STRLite trains scene text recognition models in two stages: MAE pretraining for visual representation learning, followed by autoregressive decoder fine-tuning for text generation.

Repository: https://github.com/balaboom123/STRLite

1. Usage

Installation Guide

We provide installation instructions in INSTALLATION.md.

Data Preparation

We describe how to prepare the datasets in DATASET.md.

2. STRLite

2.1. Pre-training

  • ViT-Tiny pretrained on U14M-U.

    Variants Embedding Depth Heads Parameters Download
    ViT-Tiny 192 12 12 6M HuggingFace
  • To pre-train the ViT backbone on your own dataset, see §3.1 MAE Pretraining.

2.2. Fine-tuning

2.3 Results

Results of STRLite Accuracy (%) with or without MAE pretraining on six common Datasets.

Common STR benchmarks

Subset w/ pretrain w/o pretrain
CUTE80 95.83 94.79
IC13 96.85 96.50
IC15 86.80 86.25
IIIT5k 96.97 96.47
SVT 95.36 94.90
SVTP 92.40 89.77
Weighted avg. 93.82 93.12

U14M benchmarks

Subset w/ pretrain w/o pretrain
artistic 67.78 62.11
contextless 78.95 77.43
curve 82.19 78.97
general 81.07 79.96
multi oriented 82.91 78.57
multi words 76.72 74.31
salient 78.17 75.33
Weighted avg. 81.03 79.88

3. Quick Start

The end-to-end workflow is: pretrain a MAE encoder, fine-tune with an autoregressive decoder, then evaluate a checkpoint on validation or test benchmarks.

3.1 MAE Pretraining

python main_pretrain.py data_path='[/path/to/lmdb_pretrain]'

Distributed example:

torchrun --nproc_per_node=8 main_pretrain.py \
  data_path='[/path/to/lmdb_pretrain]'

3.2 Fine-tuning

python main_finetune.py \
  train_data_path='[/path/to/lmdb_train]' \
  val_data_path='[/path/to/lmdb_val]' \
  pretrained_mae=/path/to/pretrain_checkpoint.pth

3.3 Evaluation

Eval via fine-tune script (evaluates val_data_path):

python main_finetune.py \
  train_data_path='[/path/to/lmdb_train]' \
  val_data_path='[/path/to/lmdb_val]' \
  resume=/path/to/finetune_checkpoint.pth \
  eval=true

Standalone eval (recommended for benchmark reporting):

python eval.py \
  resume=/path/to/finetune_checkpoint.pth \
  test_data_path='[/path/to/lmdb_test]'

About

[ICCE 2026] Code base for STRLite: Lightweight Masked Autoencoders for Adaptable Scene Text Recognition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages