Extract structured data from document images using multimodal LLMs.
Jaison is a platform that leverages multimodal Large Language Models (LLMs) to extract structured information from document images. Unlike traditional OCR services that only convert text from images, Jaison uses advanced visual understanding capabilities of multimodal models to extract specific data requested by the user.
Users can upload images of documents (like receipts, invoices, tickets), specify what information they want to extract using natural language prompts, and receive structured JSON data in response.
Jaison uses a microservices architecture with two separate services:
- OCR API Service (Port 8420): Handles document processing and OCR functionality
- Admin API Service (Port 8421): Handles user authentication, API key management, and database access
- Upload document images via API
- Extract structured data using natural language prompts
- Receive standardized JSON responses
- Dashboard for API key management and usage tracking
- User authentication and account management
- Support for various document types (receipts, invoices, IDs, etc.)
- Python 3.9+
- Node.js 16+ and npm (for frontend)
- Supabase account
- OpenRouter API key
-
Clone the repository:
git clone https://github.com/zakantonio/jaison.git cd jaison -
Set up the environment:
# Create and activate virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -e ".[dev]"
-
Configure environment variables:
cp .env.example .env # Edit .env with your actual values -
Set up the database:
- Create a new project in Supabase
- Update your
.envfile with the Supabase URL and keys - Run the database migrations:
python scripts/run_migrations.py
- This will automatically bootstrap the database and create all necessary tables and indexes
- If you encounter any issues, you can run the bootstrap script manually:
python scripts/bootstrap_database.py
-
Create required directories:
mkdir -p uploads results
-
Navigate to the frontend directory:
cd frontend/jaison-dashboard -
Install dependencies:
npm install
-
Start the development server:
npm start
You can use the provided Makefile commands to run the application:
# Run the OCR API service
make ocr-api
# Run the Admin API service
make admin-api
# Run the frontend server
make frontendThe OCR API will be available at http://localhost:8420 The Admin API will be available at http://localhost:8421 The frontend will be available at http://localhost:3000
The project includes a Makefile with various helpful commands:
make help # Show all available commands# Run all tests
make test
# Run tests with coverage report
make test-cov# Format code
make format
# Lint code
make lintjaison/ocr_api/- OCR API Servicejaison/ocr_api/api/- OCR API endpoints and modelsjaison/ocr_api/services/- OCR business logic and external servicesjaison/ocr_api/config/- OCR API configuration settingsjaison/ocr_api/utils/- OCR API utility functions
jaison/admin_api/- Admin API Servicejaison/admin_api/api/- Admin API endpoints and modelsjaison/admin_api/database/- Database models and repositoryjaison/admin_api/services/- Admin business logicjaison/admin_api/config/- Admin API configuration settingsjaison/admin_api/utils/- Admin API utility functions
frontend/jaison-dashboard/- React frontend applicationtests/- Test suitedocs/- Documentation filesdocs/api.md- API documentationdocs/architecture.md- Architecture documentationdocs/setup.md- Setup guide
Once the services are running, you can access the API documentation at:
- OCR API Swagger UI: http://localhost:8420/docs
- Admin API Swagger UI: http://localhost:8421/docs
Detailed API documentation is also available in the docs/api.md file.
This project is licensed under the MIT License - see the LICENSE file for details.