AlphaNet (Image to Text)

AlphaNet is an end-to-end pipeline that takes images containing text and outputs a textual representation of that content. The project is divided into six stages, each responsible for a specific part of the image-to-text process, and orchestrated by a single main_pipeline.py script.

Overview

AlphaNet is designed to convert images into accurate text by breaking down the recognition task into multiple stages:

Segmenting images into words
Extracting characters from these words
Classifying each character
Converting those classifications into text
Correcting the recognized text using a Large Language Model (LLM)
Outputting the final result

This modular approach makes it easy to replace or improve individual components without affecting the rest of the pipeline. AlphaNet stands out by combining modularity with LLM-powered corrections, ensuring high accuracy in both structured and unstructured text.

Model Workflow Diagram

The following diagram illustrates the workflow of the model:

Segmentation Module:
- Takes a .png input, segments the characters in the image, and outputs them sequentially.
Classification Module:
- Processes the segmented characters and generates their corresponding numeric representations (e.g., [4, 23, 0, 11, 15, 11, 4]).
Conversion Module:
- Converts the numeric representations into their text equivalent, identifying and marking any errors.
Correction Module:
- Corrects errors in the text (e.g., replacing incorrect characters, as shown with the red N).
Output:
- Produces the corrected text, ensuring it matches the expected input.

Pipeline Stages

Stage 1: Character Segmentation

File/Module: stage_1
Purpose: Takes sentence images, breaks them into words, and further segments these into single characters for the next stage.

Stage 2: Character Classification

File/Module: stage_2
Purpose: Classifies each character image (from Stage 1) and maps it to the corresponding textual character.

Stage 3: Vector-to-Text

File/Module: stage_3
Purpose: Converts the classification outputs (which may be in vector/label format) into text strings.

Stage 4: Correction Module

File/Module: stage_4
Purpose: Refines and corrects the recognized text using a Large Language Model (LLM) to ensure higher accuracy and resolve ambiguities.

Stage 5: Output Module

File/Module: stage_5
Purpose: Finalizes the text (e.g., formatting, post-processing) and provides it as output (console, file, or other desired format).

Installation

Prerequisites

Python 3.12 or later.
pip (Python package manager).
Internet connection for downloading LLM models.

Steps

Clone this repository:

git clone https://github.com/your-username/AlphaNet.git
cd AlphaNet
git lfs pull #if not already done automatcally

Classification DataSet
Example Dataset: Preprocessed datasets are available here

unzip Downloaded/path/DataSet.zip #
mv -r Downloaded/path/Dataset Stage_2_Classification_Module/DataSet # or path/to/AlphaNet/Stage_2_Classification_Module

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate    # Linux/MacOS  
venv\Scripts\activate       # Windows

Install the requerirments:
```
pip install -r requirements.txt
```
Install Ollama
```
curl -fsSL https://ollama.com/install.sh | sh #for linux
```
or follow the link and download from Ollama's website.
Install LLama2
```
ollama serve&
ollama pull llama2
```

Install VIT model VIT Model: Available for download here.

mv Downloaded/path/best_model_ViT_2025-01-01.pth Stage_2_Classification_Module/models/vit_models/best_model_ViT_2025-01-01.pth # or path/to/AlphaNet/Stage_2_Classification_Module

Usage

Launching the Application

To start the graphical user interface, execute the following command in your terminal:

python Project_Main/run_gui.py

This command will create a server, in which you can access from your browser using the following url:

http://127.0.0.1:7860

Also you can run the pipeline without gui, use the following command:

python Project_Main/Main_Pipeline.py

Image to Text Processing System

Getting Started

To launch the application, run:

python Project_Main/run_gui.py

Roadmap📍

Create an OCR pipeline for converting full sentences to text
Support panctuation marks
Improving Segentation
Adjust the pipeline to IAM Handwritten Forms Dataset
Multi-language Support
- Chinese
- Spanish

System Interfaces and Visual Representation

The AlphaNet system incorporates several user interfaces, designed to optimize user experience and operational efficiency. Below is a detailed description of each interface, accompanied by visual representations.

Upload and Process Interface

The Upload and Process Interface serves as the initial interaction point for users. It supports:

Drag-and-Drop Functionality: Enables seamless image upload without navigating file directories.
Real-Time Image Preview: Provides immediate feedback to verify uploaded content.
Instant Image-to-text Options: In a single button you can run the whole pipeline, and get a visual output.

This interface exemplifies usability by combining simplicity with efficiency.

Generate and Process Interface

The Generate and Process Interface is for text-based image creation and transformation to text. Key features include:

Custom Text Generation: Users can input custom text to generate corresponding images.
Font Customization: A variety of font styles are available to enhance personalization.
Integrated Processing: Generated images can be processed instantly within the interface.
Clean Images: Remove the last uses of the system, for clean and optimal usage. This component bridges creative input with functional output.

Directory Management Interface

The Directory Management Interface is designed for efficient system maintenance, providing tools to manage operational directories. It includes:

Reset Options: Enables users to restore directories to their default state.
Real-Time Feedback: Displays the current status of directories for enhanced oversight.

This interface ensures smooth backend operation, critical for system stability.

About Section

The About Section offers an overview of the system’s core functionalities in a structured and accessible manner. It highlights:

System Capabilities: Upload, text generation, processing, and directory management tools.
Key Features: Modular design, deep learning integration, and GUI accessibility.

This section functions as a comprehensive introduction to AlphaNet.

Conclusion

These visual interfaces exemplify the modular and user-centric design of AlphaNet. Each interface addresses specific stages of the image-to-text conversion pipeline, ensuring both ease of use and operational efficiency. Together, they form an integral part of the system's accessibility and functionality.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
Project_Main		Project_Main
Stage_1_Segmentation_Module		Stage_1_Segmentation_Module
Stage_2_Classification_Module		Stage_2_Classification_Module
Stage_3_Conversion_Module		Stage_3_Conversion_Module
Stage_4_Correction_Module		Stage_4_Correction_Module
Stage_5_Output_Module		Stage_5_Output_Module
readme_images		readme_images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirments.txt		requirments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaNet (Image to Text)

Table of Contents

Overview

Model Workflow Diagram

Pipeline Stages

Stage 1: Character Segmentation

Stage 2: Character Classification

Stage 3: Vector-to-Text

Stage 4: Correction Module

Stage 5: Output Module

Installation

Prerequisites

Steps

Usage

Launching the Application

Image to Text Processing System

Getting Started

Roadmap📍

System Interfaces and Visual Representation

Upload and Process Interface

Generate and Process Interface

Directory Management Interface

About Section

Conclusion

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AlphaNet (Image to Text)

Table of Contents

Overview

Model Workflow Diagram

Pipeline Stages

Stage 1: Character Segmentation

Stage 2: Character Classification

Stage 3: Vector-to-Text

Stage 4: Correction Module

Stage 5: Output Module

Installation

Prerequisites

Steps

Usage

Launching the Application

Image to Text Processing System

Getting Started

Roadmap📍

System Interfaces and Visual Representation

Upload and Process Interface

Generate and Process Interface

Directory Management Interface

About Section

Conclusion

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages