Status: Beta - Under active development
Byte-Vision is a privacy-first document intelligence platform that transforms static documents into an interactive, searchable knowledge base. Built on Elasticsearch with RAG (Retrieval-Augmented Generation) capabilities, it offers document parsing, OCR processing, and conversational AI interfacesβall running locally to ensure complete data privacy.
- π Universal Document Processing - Parse PDFs, text files, and CSVs with built-in OCR for image-based content
- π AI-Enhanced Search - Semantic search powered by Elasticsearch and vector embeddings
- π¬ Conversational AI - Document-specific Q&A and free-form chat with local LLM integration
- π Research Management - Automatically save and organize insights from document analysis
- π Privacy-First - Runs entirely locally with no external data transmission
- π₯οΈ Intuitive Interface - Full-featured UI that simplifies complex document operations
For detailed setup instructions, see Installation Guide.
- Interface Tour
- Installation
- Configuration
- Usage
- Troubleshooting
- Development
- Contributing
- Roadmap
- License
- Contact
The main "Document Search" screen allows you to locate and analyze documents after they have been parsed and indexed in Elasticsearch.
Click the "View" button to display the original parsed document.
View previously saved question-answer history items for the selected document.
Enter your questions about the document using this interface.
The system processes your question and searches through the document.
View the AI-generated answers based on your document content.
Export your question-answer sessions to PDF format for documentation.
Parse PDF, text, and CSV files for processing and analysis.
View the results of document parsing and chunking operations.
Configure OCR settings for processing scanned documents.
Review extracted text from image-based documents.
Primary inference screen for general AI conversations.
View previous conversations and responses.
Export inference conversations to PDF format.
| Component | Version | Purpose |
|---|---|---|
| Go | 1.23+ | Backend services |
| Node.js | 18+ | Frontend build system |
| Elasticsearch | 8.x | Document indexing and search |
| Wails | v2 | Desktop application framework |
- OS: Windows 10+, macOS 10.13+, or Linux
- RAM: 8GB minimum (16GB recommended)
- Storage: 5GB free space
- CPU: Multi-core processor recommended
- CUDA: Enables GPU acceleration for AI models
- Docker: Containerize Elasticsearch for easier deployment
git clone https://github.com/kbrisso/byte-vision.git
cd byte-vision
# Install Go dependencies
go mod download && go mod tidy
# Install Wails CLI
go install github.com/wailsapp/wails/v2/cmd/wails@latest
# Install frontend dependencies
cd frontend && npm install && cd ..Option A: Docker (Recommended)
-p 9200:9200 -p 9300:9300
-e "discovery.type=single-node"
-e "xpack.security.enabled=false"
docker.elastic.co/elastic/elastic:8.11.0Option B: Local Installation
- Download from Elasticsearch Downloads
- Extract and run:
# Windows
bin\elasticsearch.bat
# macOS/Linux
bin/elastic Option A: Download Pre-built Binaries (Recommended)
- Visit LlamaCpp releases
- Download for your platform:
- Windows:
llama-*-bin-win-x64.zip(CPU) orllama-*-bin-win-cuda-cu*.zip(GPU) - Linux:
llama-*-bin-ubuntu-x64.tar.gz - macOS:
brew install llama.cpp
- Windows:
- Extract to
llamacpp/directory
Option B: Build from Source
git clone https://github.com/ggerganov/llama.cpp.git temp-llama
cd temp-llama && mkdir build && cd build
cmake .. -DLLAMA_CUDA=ON # Add for GPU support
cmake --build . --config Release
cp bin/llama-cli ../llamacpp/
cd ../.. && rm -rf temp-llama
mkdir -p models
curl -L -o models/llama-2-7b-chat.Q4_K_M.gguf \
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf
curl -L -o models/all-MiniLM-L6-v2.gguf \
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2-gguf/resolve/main/all-MiniLM-L6-v2.gguf
Download and install xpdf-tools for PDF processing:
Option A: Download Pre-built Binaries (Recommended)
- Visit Xpdf downloads
- Download the appropriate version for your platform:
- Windows:
xpdf-tools-win-*-setup.exe - Linux:
xpdf-tools-linux-*-static.tar.gz - macOS:
xpdf-tools-mac-*-setup.dmg
- Windows:
- Extract or install to the
xpdf-tools/directory in your project root
Option B: Package Manager Installation
# macOS
brew install xpdf
# Ubuntu/Debian
sudo apt-get install xpdf-utils
# Windows (using Chocolatey)
choco install xpdf-utilsInstall Tesseract-OCR for optical character recognition:
Windows:
- Download from Tesseract releases
- Install the executable
- Add Tesseract to your system PATH:
- Add
C:\Program Files\Tesseract-OCRto your PATH environment variable - Or add custom path in
byte-vision-cfg.env:TESSERACT_PATH=C:\path\to\tesseract.exe
- Add
macOS:
brew install tesseractLinux (Ubuntu/Debian):
sudo apt-get install tesseract-ocrVerify Installation:
tesseract --versionCreate byte-vision-cfg.env:
ELASTICSEARCH_URL=http://localhost:9200 ELASTICSEARCH_USERNAME=elastic ELASTICSEARCH_PASSWORD=your_password
LLAMA_CLI_PATH=./llamacpp/llama-cli LLAMA_EMBEDDING_PATH=./llamacpp/llama-embedding
MODEL_PATH=./models DEFAULT_INFERENCE_MODEL=llama-2-7b-chat.Q4_K_M.gguf DEFAULT_EMBEDDING_MODEL=all-MiniLM-L6-v2.gguf
MAX_CHUNK_SIZE=1000 CHUNK_OVERLAP=200 LOG_LEVEL=INFO
wails devThe application will launch with hot reload enabled.
wails buildThe built application will be in the build/ directory
The application uses environment variables defined in byte-vision-cfg.env:
| Variable | Description | Default |
|---|---|---|
ELASTICSEARCH_URL |
Elasticsearch server URL | http://localhost:9200 |
ELASTICSEARCH_USERNAME |
Elasticsearch username | elastic |
ELASTICSEARCH_PASSWORD |
Elasticsearch password | - |
LLAMA_CLI_PATH |
Path to llama-cli executable | ./llamacpp/llama-cli |
LLAMA_EMBEDDING_PATH |
Path to llama-embedding executable | ./llamacpp/llama-embedding |
MODEL_PATH |
Directory containing AI models | ./models |
DEFAULT_INFERENCE_MODEL |
Default model for inference | - |
DEFAULT_EMBEDDING_MODEL |
Default model for embeddings | - |
MAX_CHUNK_SIZE |
Maximum text chunk size | 1000 |
CHUNK_OVERLAP |
Overlap between chunks | 200 |
LOG_LEVEL |
Application log level | INFO |
- Start Elasticsearch: Ensure Elasticsearch is running
- Launch Byte-Vision: Run the application
- Configure Models: Go to Settings β LlamaCpp Settings and set paths
- Test Connection: Verify Elasticsearch connection in Settings
- Upload Documents: Use the document parser to upload and process files
- Configure Chunking: Adjust text chunking settings for optimal search
- Index Documents: Process documents for embedding and search
- Select a document from the search results
- Click "Ask Questions" to open the Q&A interface
- Enter your questions and receive AI-generated answers
- View answer sources and confidence scores
- Export Q&A sessions to PDF
- Ask Questions: Use the document question modal to query your documents
- Export Results: Export chat history to PDF for documentation
- Compare Responses: Use the comparison feature to evaluate different model outputs
- Access the AI Inference screen for general conversations
- Chat with your local LLM models
- Export conversation history
- Compare different model responses
β Elasticsearch Connection Failed
Symptoms: Cannot connect to Elasticsearch service
Solutions:
- Verify Elasticsearch is running:
curl http://localhost:9200
- Check if port 9200 is available:
netstat -an | grep 9200 - Verify configuration in
byte-vision-cfg.env - Check firewall settings
- For Docker: Ensure container is running
docker ps | grep elastic
β LlamaCpp Model Loading Error
Symptoms: Model fails to load or produces errors
Solutions:
- Verify model file exists in
models/directory - Check model format (must be
.gguf) - Ensure sufficient RAM for model size
- Verify
LLAMA_CLI_PATHin configuration - Test LlamaCpp directly:
./llamacpp/llama-cli --model ./models/your-model.gguf --prompt "Hello"
β Frontend Build Errors
Symptoms: npm install or build failures
Solutions:
- Clear npm cache:
cd frontend rm -rf node_modules package-lock.json npm cache clean --force npm install - Check Node.js version:
node --version - Update npm:
npm install -g npm@latest
β Port Already in Use
Symptoms: Application fails to start due to port conflicts
Solutions:
- Find process using port:
# Windows netstat -ano | findstr :3000 # macOS/Linux lsof -ti:3000
- Kill process:
# Windows taskkill /PID <PID> /F # macOS/Linux kill -9 <PID>
- GPU Acceleration: Install CUDA/ROCm for faster model inference
- Model Selection: Use smaller quantized models for better performance
- Memory Management: Adjust Elasticsearch heap size for large document collections
- Chunking Optimization: Tune
MAX_CHUNK_SIZEandCHUNK_OVERLAPfor your use case
Enable debug logging:
wails dev -debugCheck logs in ./logs/ directory for detailed error information.
- Wails - Desktop application framework
- Go - Backend services and APIs
- React - Frontend user interface
- Elasticsearch - Document indexing and search
- Llama.cpp - Local AI model inference
- React Bootstrap - UI components
- Bootstrap 5 - CSS framework
- React PDF - PDF generation and viewing
- Vite - Build tooling
byte-vision/
βββ π build/ # Built application files
βββ π document/ # Document storage
βββ π frontend/ # React frontend source
β βββ π src/
β βββ π public/
βββ π llamacpp/ # LlamaCpp binaries
βββ π logs/ # Application logs
βββ π models/ # AI model files (.gguf)
βββ π prompt-cache/ # Cached prompts
βββ π prompt-temp/ # Prompt templates
βββ π xpdf-tools/ # PDF processing tools
βββ π byte-vision-cfg.env # Configuration file
βββ π wails.json # Wails configuration
βββ π go.mod # Go dependencies
- Application logs:
./logs/ - Elasticsearch logs: Check Elasticsearch installation directory
- Debug mode:
wails dev -debug - Frontend logs: Browser developer console
- Backend logs: Terminal output during development
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also open an issue with the tag "enhancement." Remember to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Follow Go formatting standards (
go fmt) - Write tests for new features
- Update documentation for API changes
- Use semantic commit messages
- Ensure all tests pass before submitting
- Settings persistence for llama-cli configuration
- Settings persistence for llama-embedding configuration
- Enhanced documentation and examples
- Additional document format support (DOCX, PPT, etc.)
- Advanced search filters and operators
- Batch document processing capabilities
- RESTful API for external integrations
- Docker deployment configuration
- User authentication and access control
- Cloud storage integration (S3, Google Drive, etc.)
- Multi-language support
- Advanced analytics and reporting
- Distributed processing for large document collections
- Plugin architecture for custom processors
- Integration with external AI services
- Mobile application companion
See open issues for detailed feature requests and bug reports.
This project is licensed under the terms of the MIT license.
Kevin Brisson - LinkedIn - kbrisso@gmail.com
Project Link: https://github.com/kbrisso/byte-vision
β Star this project if you find it helpful!















