Byte-Vision: AI-Powered Document Intelligence Platform

Status: Beta - Under active development

Byte-Vision is a privacy-first document intelligence platform that transforms static documents into an interactive, searchable knowledge base. Built on Elasticsearch with RAG (Retrieval-Augmented Generation) capabilities, it offers document parsing, OCR processing, and conversational AI interfaces—all running locally to ensure complete data privacy.

✨ Key Features

📄 Universal Document Processing - Parse PDFs, text files, and CSVs with built-in OCR for image-based content
🔍 AI-Enhanced Search - Semantic search powered by Elasticsearch and vector embeddings
💬 Conversational AI - Document-specific Q&A and free-form chat with local LLM integration
📊 Research Management - Automatically save and organize insights from document analysis
🔒 Privacy-First - Runs entirely locally with no external data transmission
🖥️ Intuitive Interface - Full-featured UI that simplifies complex document operations

🚀 Quick Start

Prerequisites

Installation

For detailed setup instructions, see Installation Guide.

🖼️ Interface Tour

Document Search Screen

The main "Document Search" screen allows you to locate and analyze documents after they have been parsed and indexed in Elasticsearch.

Document Viewer

Click the "View" button to display the original parsed document.

Question and Answer Interface

Default View - History Tab

View previously saved question-answer history items for the selected document.

Question Entry Form

Enter your questions about the document using this interface.

Processing Stage

The system processes your question and searches through the document.

Results Display

View the AI-generated answers based on your document content.

Export to PDF

Export your question-answer sessions to PDF format for documentation.

Document Processing Features

Document Parsing and Chunking

Parse PDF, text, and CSV files for processing and analysis.

Parser Results

View the results of document parsing and chunking operations.

OCR Processing

Image Scan Setup

Configure OCR settings for processing scanned documents.

OCR Results

Review extracted text from image-based documents.

AI Inference Screen

Main Interface

Primary inference screen for general AI conversations.

Chat History

View previous conversations and responses.

Export Chat History

Export inference conversations to PDF format.

📦 Installation

Prerequisites

Component	Version	Purpose
Go	1.23+	Backend services
Node.js	18+	Frontend build system
Elasticsearch	8.x	Document indexing and search
Wails	v2	Desktop application framework

System Requirements

OS: Windows 10+, macOS 10.13+, or Linux
RAM: 8GB minimum (16GB recommended)
Storage: 5GB free space
CPU: Multi-core processor recommended

Optional Dependencies

CUDA: Enables GPU acceleration for AI models
Docker: Containerize Elasticsearch for easier deployment

Development Setup

1. Clone and Install Dependencies

git clone https://github.com/kbrisso/byte-vision.git
cd byte-vision

# Install Go dependencies
go mod download && go mod tidy

# Install Wails CLI
go install github.com/wailsapp/wails/v2/cmd/wails@latest

# Install frontend dependencies
cd frontend && npm install && cd ..

2. Set Up Elasticsearch

Option A: Docker (Recommended)

-p 9200:9200 -p 9300:9300
-e "discovery.type=single-node"
-e "xpack.security.enabled=false"
docker.elastic.co/elastic/elastic:8.11.0

Option B: Local Installation

Download from Elasticsearch Downloads
Extract and run:

# Windows
bin\elasticsearch.bat       
# macOS/Linux
bin/elastic

3. Install LlamaCpp

Option A: Download Pre-built Binaries (Recommended)

Visit LlamaCpp releases
Download for your platform:
- Windows: llama-*-bin-win-x64.zip (CPU) or llama-*-bin-win-cuda-cu*.zip (GPU)
- Linux: llama-*-bin-ubuntu-x64.tar.gz
- macOS: brew install llama.cpp
Extract to llamacpp/ directory

Option B: Build from Source

git clone https://github.com/ggerganov/llama.cpp.git temp-llama
cd temp-llama && mkdir build && cd build
cmake .. -DLLAMA_CUDA=ON  # Add for GPU support
cmake --build . --config Release
cp bin/llama-cli ../llamacpp/
cd ../.. && rm -rf temp-llama

4. Download AI Models

mkdir -p models

Download example models

curl -L -o models/llama-2-7b-chat.Q4_K_M.gguf \
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf

curl -L -o models/all-MiniLM-L6-v2.gguf \
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2-gguf/resolve/main/all-MiniLM-L6-v2.gguf

5. Install xpdf-tools

Download and install xpdf-tools for PDF processing:

Option A: Download Pre-built Binaries (Recommended)

Visit Xpdf downloads
Download the appropriate version for your platform:
- Windows: xpdf-tools-win-*-setup.exe
- Linux: xpdf-tools-linux-*-static.tar.gz
- macOS: xpdf-tools-mac-*-setup.dmg
Extract or install to the xpdf-tools/ directory in your project root

Option B: Package Manager Installation

# macOS
brew install xpdf
# Ubuntu/Debian
sudo apt-get install xpdf-utils
# Windows (using Chocolatey)
choco install xpdf-utils

6. Install Tesseract-OCR

Install Tesseract-OCR for optical character recognition:

Windows:

Download from Tesseract releases
Install the executable
Add Tesseract to your system PATH:
- Add C:\Program Files\Tesseract-OCR to your PATH environment variable
- Or add custom path in byte-vision-cfg.env: TESSERACT_PATH=C:\path\to\tesseract.exe

macOS:

brew install tesseract

Linux (Ubuntu/Debian):

sudo apt-get install tesseract-ocr

Verify Installation:

tesseract --version

7. Configure Environment

Create byte-vision-cfg.env:

Elasticsearch Configuration

ELASTICSEARCH_URL=http://localhost:9200 ELASTICSEARCH_USERNAME=elastic ELASTICSEARCH_PASSWORD=your_password

LlamaCpp Configuration

LLAMA_CLI_PATH=./llamacpp/llama-cli LLAMA_EMBEDDING_PATH=./llamacpp/llama-embedding

Model Configuration

MODEL_PATH=./models DEFAULT_INFERENCE_MODEL=llama-2-7b-chat.Q4_K_M.gguf DEFAULT_EMBEDDING_MODEL=all-MiniLM-L6-v2.gguf

Application Settings

MAX_CHUNK_SIZE=1000 CHUNK_OVERLAP=200 LOG_LEVEL=INFO

8. Run the Application

wails dev

The application will launch with hot reload enabled.

Production Build

wails build

The built application will be in the build/ directory

⚙️ Configuration

The application uses environment variables defined in byte-vision-cfg.env:

Variable	Description	Default
`ELASTICSEARCH_URL`	Elasticsearch server URL	`http://localhost:9200`
`ELASTICSEARCH_USERNAME`	Elasticsearch username	`elastic`
`ELASTICSEARCH_PASSWORD`	Elasticsearch password	-
`LLAMA_CLI_PATH`	Path to llama-cli executable	`./llamacpp/llama-cli`
`LLAMA_EMBEDDING_PATH`	Path to llama-embedding executable	`./llamacpp/llama-embedding`
`MODEL_PATH`	Directory containing AI models	`./models`
`DEFAULT_INFERENCE_MODEL`	Default model for inference	-
`DEFAULT_EMBEDDING_MODEL`	Default model for embeddings	-
`MAX_CHUNK_SIZE`	Maximum text chunk size	`1000`
`CHUNK_OVERLAP`	Overlap between chunks	`200`
`LOG_LEVEL`	Application log level	`INFO`

📝 Usage

First-Time Setup

Start Elasticsearch: Ensure Elasticsearch is running
Launch Byte-Vision: Run the application
Configure Models: Go to Settings → LlamaCpp Settings and set paths
Test Connection: Verify Elasticsearch connection in Settings

Document Management

Upload Documents: Use the document parser to upload and process files
Configure Chunking: Adjust text chunking settings for optimal search
Index Documents: Process documents for embedding and search

Document-Specific Q&A

Select a document from the search results
Click "Ask Questions" to open the Q&A interface
Enter your questions and receive AI-generated answers
View answer sources and confidence scores
Export Q&A sessions to PDF

AI Interactions

Ask Questions: Use the document question modal to query your documents
Export Results: Export chat history to PDF for documentation
Compare Responses: Use the comparison feature to evaluate different model outputs

Free-Form Chat

Access the AI Inference screen for general conversations
Chat with your local LLM models
Export conversation history
Compare different model responses

🔧 Troubleshooting

Common Issues

❌ Elasticsearch Connection Failed

Symptoms: Cannot connect to Elasticsearch service

Solutions:

Verify Elasticsearch is running:
```
curl http://localhost:9200
```
Check if port 9200 is available:
```
netstat -an | grep 9200
```
Verify configuration in byte-vision-cfg.env
Check firewall settings
For Docker: Ensure container is running
```
docker ps | grep elastic
```

❌ LlamaCpp Model Loading Error

Symptoms: Model fails to load or produces errors

Solutions:

Verify model file exists in models/ directory
Check model format (must be .gguf)
Ensure sufficient RAM for model size
Verify LLAMA_CLI_PATH in configuration

Test LlamaCpp directly:

./llamacpp/llama-cli --model ./models/your-model.gguf --prompt "Hello"

❌ Frontend Build Errors

Symptoms: npm install or build failures

Solutions:

Clear npm cache:

cd frontend
rm -rf node_modules package-lock.json
npm cache clean --force
npm install

Check Node.js version: node --version
Update npm: npm install -g npm@latest

❌ Port Already in Use

Symptoms: Application fails to start due to port conflicts

Solutions:

Find process using port:

# Windows
netstat -ano | findstr :3000

# macOS/Linux
lsof -ti:3000

Kill process:

# Windows
taskkill /PID <PID> /F

# macOS/Linux
kill -9 <PID>

Performance Tips

GPU Acceleration: Install CUDA/ROCm for faster model inference
Model Selection: Use smaller quantized models for better performance
Memory Management: Adjust Elasticsearch heap size for large document collections
Chunking Optimization: Tune MAX_CHUNK_SIZE and CHUNK_OVERLAP for your use case

Debug Mode

Enable debug logging:

wails dev -debug

Check logs in ./logs/ directory for detailed error information.

🛠️ Development

Built With

Core Technologies

Wails - Desktop application framework
Go - Backend services and APIs
React - Frontend user interface
Elasticsearch - Document indexing and search
Llama.cpp - Local AI model inference

Frontend Stack

React Bootstrap - UI components
Bootstrap 5 - CSS framework
React PDF - PDF generation and viewing
Vite - Build tooling

Backend Libraries

go-ocr - OCR processing
chunker - Text chunking

Project Structure

byte-vision/ 
├── 📁 build/ # Built application files 
├── 📁 document/ # Document storage 
├── 📁 frontend/ # React frontend source 
│ ├── 📁 src/ 
│ └── 📁 public/ 
├── 📁 llamacpp/ # LlamaCpp binaries 
├── 📁 logs/ # Application logs 
├── 📁 models/ # AI model files (.gguf) 
├── 📁 prompt-cache/ # Cached prompts 
├── 📁 prompt-temp/ # Prompt templates 
├── 📁 xpdf-tools/ # PDF processing tools 
├── 📄 byte-vision-cfg.env # Configuration file 
├── 📄 wails.json # Wails configuration 
└── 📄 go.mod # Go dependencies

Logs and Debugging

Application logs: ./logs/
Elasticsearch logs: Check Elasticsearch installation directory
Debug mode: wails dev -debug
Frontend logs: Browser developer console
Backend logs: Terminal output during development

🤝 Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also open an issue with the tag "enhancement." Remember to give the project a star! Thanks again!

How to Contribute

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Development Guidelines

Follow Go formatting standards (go fmt)
Write tests for new features
Update documentation for API changes
Use semantic commit messages
Ensure all tests pass before submitting

📈 Roadmap

In Progress

Settings persistence for llama-cli configuration
Settings persistence for llama-embedding configuration
Enhanced documentation and examples

Planned Features

Additional document format support (DOCX, PPT, etc.)
Advanced search filters and operators
Batch document processing capabilities
RESTful API for external integrations
Docker deployment configuration
User authentication and access control
Cloud storage integration (S3, Google Drive, etc.)
Multi-language support
Advanced analytics and reporting

Long-term Vision

Distributed processing for large document collections
Plugin architecture for custom processors
Integration with external AI services
Mobile application companion

See open issues for detailed feature requests and bug reports.

📄 License

This project is licensed under the terms of the MIT license.

📧 Contact

Kevin Brisson - LinkedIn - kbrisso@gmail.com

Project Link: https://github.com/kbrisso/byte-vision

⭐ Star this project if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
document		document
frontend		frontend
llamacpp		llamacpp
logs		logs
models		models
prompt-cache		prompt-cache
prompt-temp		prompt-temp
readme-assets		readme-assets
xpdf-tools		xpdf-tools
.gitignore		.gitignore
README.md		README.md
app-elastic-service.go		app-elastic-service.go
app-log-handler.go		app-log-handler.go
app-settings-manager.go		app-settings-manager.go
app.go		app.go
constants.go		constants.go
cvs-parser.go		cvs-parser.go
database.go		database.go
directory.go		directory.go
document-add-event-handler.go		document-add-event-handler.go
document-query-event-handler.go		document-query-event-handler.go
document-service.go		document-service.go
document.go		document.go
elastic-search.go		elastic-search.go
embed-service.go		embed-service.go
error-handling.go		error-handling.go
event-handler.go		event-handler.go
example-byte-vision-cfg.env		example-byte-vision-cfg.env
go.mod		go.mod
go.sum		go.sum
inference-event-handler.go		inference-event-handler.go
inference-service.go		inference-service.go
loader.go		loader.go
main.go		main.go
ocr.go		ocr.go
pdf-to-image.go		pdf-to-image.go
pdf-to-text.go		pdf-to-text.go
prompts.go		prompts.go
textloader.go		textloader.go
textsplitter.go		textsplitter.go
types.go		types.go
wails.json		wails.json
wailsdev.bat		wailsdev.bat

kbrisso/byte-vision

Folders and files

Latest commit

History

Repository files navigation

Byte-Vision: AI-Powered Document Intelligence Platform

✨ Key Features

🚀 Quick Start

Prerequisites

Installation

📋 Table of Contents

🖼️ Interface Tour

Document Search Screen

Document Viewer

Question and Answer Interface

Default View - History Tab

Question Entry Form

Processing Stage

Results Display

Export to PDF

Document Processing Features

Document Parsing and Chunking

Parser Results

OCR Processing

Image Scan Setup

OCR Results

AI Inference Screen

Main Interface

Chat History

Export Chat History

📦 Installation

Prerequisites

System Requirements

Optional Dependencies

Development Setup

1. Clone and Install Dependencies

2. Set Up Elasticsearch

3. Install LlamaCpp

4. Download AI Models

Download example models

5. Install xpdf-tools

6. Install Tesseract-OCR

7. Configure Environment

Elasticsearch Configuration

LlamaCpp Configuration

Model Configuration

Application Settings

8. Run the Application

Production Build

⚙️ Configuration

📝 Usage

First-Time Setup

Document Management

Document-Specific Q&A

AI Interactions

Free-Form Chat

🔧 Troubleshooting

Common Issues

Performance Tips

Debug Mode

🛠️ Development

Built With

Core Technologies

Frontend Stack

Backend Libraries

Project Structure

Logs and Debugging

🤝 Contributing

How to Contribute

Development Guidelines

📈 Roadmap

In Progress

Planned Features

Long-term Vision

📄 License

📧 Contact

About

Topics

Resources

Uh oh!

Packages