DebAI is a cutting-edge, dual-theme AI assistant built with Streamlit. It combines powerful OCR (Optical Character Recognition) capabilities with a sophisticated chat interface, wrapped in a stunning "Ultimate Glassmorphism" UI.
Whether you need to extract text from scanned documents, analyze PDFs, or have a conversation with a local (Ollama) or cloud-based (Gemini) LLM, DebAI handles it with style and precision.
- Local Power: Seamless integration with Ollama for running privacy-focused local models (e.g., Gemma, Llama 3).
- Cloud Fallback: Automatic fallback to Google Gemini when local models are unavailable.
- Multilingual Support: Languages can be detected properly in Hindi, English, and Bengali.
- Smart Response Logic: If a user writes text in English but desires a response in another language (Hindi or Bengali), the model identifies the intended language and responds accordingly.
- Future Roadmap: More necessary changes are in progress, including broader implementation of Indian languages.
- Image OCR: Extract text from images (
.png,.jpg,.jpeg) using Tesseract. - PDF Analysis: Read and extract text from multi-page PDF documents.
- Auto-Context: Extracted text is automatically fed into the chat context for immediate analysis.
- Dual Theme: Switch between a Cinematic Dark Mode and a Clean, Airy Light Mode.
- Visuals: Features frosted glass cards, animated backgrounds (
orbFloat), and smooth transitions. - Responsive: Perfectly optimized layout for various screen sizes.
- PDF Export: Download your entire chat session as a formatted PDF report.
- Hotkeys: Quick actions like "Send Last OCR" (Alt+S) for rapid workflows.
- Frontend: Streamlit
- OCR Engine: Tesseract OCR & PyTesseract
- PDF Processing: pdfplumber
- AI Models: Ollama (Local) & Google Gemini (Cloud)
- Report Generation: FPDF
Ensure you have the following installed:
- Python 3.8+
- Tesseract OCR:
- Windows: Download and install the binary. Note the installation path (default:
C:\Program Files\Tesseract-OCR\tesseract.exe).
- Windows: Download and install the binary. Note the installation path (default:
- Ollama (Optional, for local models):
- Install Ollama and pull a model:
ollama pull gemma:2b(or your preferred model).
- Install Ollama and pull a model:
Clone the repository and install dependencies:
git clone https://github.com/DebasmitaBose0/Code-Genie-AI-Team-A-.git
cd Code-Genie-AI-Team-A-
pip install -r requirements.txtDebAI works out-of-the-box with Ollama. To use Google Gemini as a fallback, set your API key:
Windows (PowerShell):
$env:GEMINI_API_KEY="your_api_key_here"Linux/Mac:
export GEMINI_API_KEY="your_api_key_here"(Optional) You can also configure the Tesseract path in AI.py if it differs from the default.
Launch the application using Streamlit:
streamlit run AI.pyThe app will open in your default browser at http://localhost:8501.
- Upload Documents: Use the sidebar or top tabs to upload Images or PDFs.
- Extract Text: The app will automatically extract text. You can choose to send it to the AI immediately or edit/review it.
- Chat: Type your queries in the chat bar. The AI has context of your uploaded documents.
- Switch Themes: Toggle between Light and Dark mode using the button in the top-right corner.
- Export: Click "Download Report (PDF)" in the sidebar to save your conversation.
The project is organized as a single-module Streamlit application with a clean separation of concerns for OCR, AI client management, and UI rendering.
.
├── AI.py # Main application entry point & logic
├── LICENSE # MIT License (2025)
├── README.md # Project documentation
├── requirements.txt # Python dependencies
└── venv/ # Virtual environment (ignored by git)
- OCR Engine: Utilizes
easyocr(primary) andpytesseract(fallback) for multilingual text extraction. - Preprocessing: Custom
PILfilters (Sharpen, Contrast, Grayscale) to enhance document readability. - AI Clients: Dynamic switching between
Ollama(local) andGemini(cloud API) usingimportlib. - UI System: Streamlit-based custom CSS injection for "Ultimate Glassmorphism" styling.
DebAI follows a linear data processing pipeline to ensure high accuracy and contextual awareness:
- Input Layer: User uploads an Image (PNG/JPG) or PDF.
- Preprocessing:
- Images: Grayscale conversion → 2.0x Contrast Enhancement → Double Sharpening → 1.1x Brightness.
- PDFs: Text extraction via
pdfplumber.
- OCR Pass:
- Pass 1 (English): Initial scan to identify content.
- Language Detection:
langdetectanalyzes the initial text. - Pass 2 (Target Language): If Hindi or Bengali is detected, the OCR engine re-runs with specialized models.
- Context Injection: The extracted text is added to the AI session state.
- Inference: The AI model (Ollama or Gemini) processes the text based on user prompts.
- Output: Response is rendered in the Glassmorphic UI with an option to Export to PDF.
Q: Do I need an internet connection to use DebAI?
A: No! If you have Ollama installed and running locally with a model like gemma:2b, DebAI works completely offline. An internet connection is only required if you want to use the Google Gemini fallback.
Q: Which languages are supported for OCR? A: DebAI currently has first-class support for English, Hindi, and Bengali. It uses automatic language detection to switch between these models seamlessly.
Q: My OCR results are blurry. How can I improve them? A: DebAI includes built-in preprocessing, but for best results, ensure your source images are high-resolution (300 DPI+) and have good lighting.
Q: How do I save my chat session? A: Use the "Download Report" button in the sidebar. This generates a professionally formatted PDF containing your entire conversation history.
Q: Is my data private? A: Yes. When using the local Ollama mode, your documents and chats never leave your machine.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.