MSFN is a pure Transformer model for scene text recognization. It uses a Vision Transformer for image feature extraction and a multi-lingual transformer decoder for text generation.
lclee0577/MLViT
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
| Name | Name | Last commit date | ||
|---|---|---|---|---|