UmayOCR (Instant OCR Translator HUD)

UmayOCR is a high-performance, PySide6-based instant translation tool designed for Linux. It acts as a transparent HUD (Heads-Up Display) that provides real-time subtitle translation for games, videos, movies, and any non-selectable text on your screen.

By "sniping" a specific area of your screen, UmayOCR continuously monitors that region, extracts text using OCR (Optical Character Recognition), and overlays the translated text instantly, making it perfect for playing untranslated games or watching foreign media.

Note: Current version is optimized for KDE Plasma (Wayland) using xcb mode. Future versions will expand compatibility.

🚀 Key Features

Real-Time HUD Translation: Specifically designed to provide live translations over video players and game windows.
KDE & Wayland Support: Currently runs in xcb mode, specifically tailored for KDE Plasma environments using native tools for region selection and capture.
Sniper Mode Overlay: Modular region selection system (currently uses slurp) to target subtitle areas.
Modular OCR Engine Subsystem: Dynamic multi-backend support (Tesseract & EasyOCR) for accurate text extraction.
Dual-Engine Translation Subsystem: Context-aware translations via Google Translate and DeepL (Note: DeepL is currently untested).
Unix Socket IPC Broadcasting: Real-time JSON broadcast of every translation to a local Unix socket, enabling integration with other tools (e.g., custom subtitles, logging, or OBS).
Global Shortcuts & IPC Integration: Utilizes a lightweight single-instance architecture to handle background processes and external CLI-driven shortcuts.

📡 External Integration (IPC)

UmayOCR features a built-in Unix Domain Socket server that broadcasts every translation in real-time. This allows you to "pipe" translations into other applications, scripts, or overlays effortlessly.

This feature is designed for high extensibility, making it easy to integrate UmayOCR into your existing workflow:

Live Streaming: Send translations directly to OBS or other broadcast software as a text source.
Accessibility: Integrate with text-to-speech (TTS) engines for real-time audio translation.
Data Logging: Pipe the stream to a file to create a searchable history or transcripts of your sessions.
Custom Overlays: Build your own UI or specialized subtitles that react to UmayOCR data.

Socket Path: /tmp/umayocr.sock

How to use:

You can listen to the stream using standard tools like netcat (nc):

nc -U /tmp/umayocr.sock

JSON Output Format:

Every translation is sent as a single line in JSON format:

{
  "original": "Hello world",
  "translated": "Merhaba dünya",
  "source": "en",
  "target": "tr",
  "engine": "Google",
  "timestamp": 1717181234.56
}

🗺️ Roadmap & Future Support

While currently focused on KDE/Linux, support for the following is planned:

Native Linux Packaging: Distribution via .deb (Debian/Ubuntu), AUR (Arch Linux), and Flatpak.
GNOME & other Linux DEs: Full native Wayland support without xcb.
Windows Support: Integration with Windows-native capture and OCR APIs.
macOS Support: Support for macOS native APIs and capture systems.

🛠️ System Requirements

Before running the application, ensure the following system packages are installed:

Ubuntu / Debian

sudo apt update
sudo apt install tesseract-ocr tesseract-ocr-rus tesseract-ocr-ara tesseract-ocr-heb tesseract-ocr-tur tesseract-ocr-vie tesseract-ocr-tha tesseract-ocr-spa slurp spectacle

Arch Linux

sudo pacman -S tesseract tesseract-data-rus tesseract-data-ara tesseract-data-heb tesseract-data-tur tesseract-data-vie tesseract-data-tha tesseract-data-spa slurp spectacle

📦 Installation & Usage

Clone the repository:

git clone https://github.com/sedataym/umayocr.git
cd umayocr

Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python run.py
```

🛠️ Tech Stack

GUI: PySide6
OCR: pytesseract, easyocr
Translation: deep-translator (DeepL untested)
Platform: Linux (KDE optimized)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
docs		docs
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
babel.cfg		babel.cfg
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UmayOCR (Instant OCR Translator HUD)

🚀 Key Features

📡 External Integration (IPC)

How to use:

JSON Output Format:

🗺️ Roadmap & Future Support

🛠️ System Requirements

Ubuntu / Debian

Arch Linux

📦 Installation & Usage

🛠️ Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UmayOCR (Instant OCR Translator HUD)

🚀 Key Features

📡 External Integration (IPC)

How to use:

JSON Output Format:

🗺️ Roadmap & Future Support

🛠️ System Requirements

Ubuntu / Debian

Arch Linux

📦 Installation & Usage

🛠️ Tech Stack

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages