A user-friendly web-based interface for training and fine-tuning Large Language Models (LLMs) on local consumer hardware. Built specifically for dual RTX 3060 GPUs with Unsloth - delivering 2-5x faster training with superior multi-GPU support.
- Model Selection: Support for popular Hugging Face models with organized categories (General Chat, Coding, Reasoning, etc.)
- Dataset Management: Upload and process datasets in JSONL, CSV, and TXT formats
- Training Methods: LoRA, QLoRA, and full fine-tuning support
- Real-time Monitoring: Live training logs, GPU statistics, and loss visualization
- Checkpoint Management: Save, resume, and manage training checkpoints
- Inference Sandbox: Test your fine-tuned models with an interactive interface
- ⚡ Unsloth Powered: 2-5x faster training than traditional methods
- 🎯 Dual GPU Optimized: Automatic multi-GPU distribution without complex setup
- 💾 Memory Efficient: QLoRA support with optimized memory usage for consumer GPUs
- 🔄 Fast Iteration: Sub-10 second feedback loop from configuration to training start
- 📊 Resumable Training: Automatic checkpoint saving every epoch
- 🖥️ Background Processing: Non-blocking UI during training operations
- 🧩 Auto-Optimization: Unsloth handles GPU placement and optimization automatically
- 🐳 Docker Ready: Complete containerization with CUDA support for easy deployment
- 🌐 CORS Optimized: Seamless frontend-backend communication via Vite proxy
- ⏱️ Extended Inference: 2-minute timeout support for large model loading and generation
- 🔧 CORS Issues Resolved: Fixed all frontend-backend connectivity problems using Vite proxy configuration
- ⏱️ Extended Inference Timeout: Increased from 30 seconds to 2 minutes for large model loading and generation
- 📊 Optimized Monitoring: Reduced polling interval from 2 seconds to 5 seconds for better performance
- 🐳 Complete Docker Setup: Added comprehensive containerization with CUDA support, multi-stage builds, and production deployment
- 🔧 GPU Memory Management: Improved model loading/unloading for better dual GPU utilization
- 🌐 Enhanced API Communication: Optimized WebSocket configuration and proxy settings
- Faster Startup: Streamlined backend initialization and model loading
- Better Resource Usage: Optimized GPU memory allocation and monitoring overhead
- Improved Stability: Enhanced error handling and recovery mechanisms
- Seamless Development: Hot reloading and auto-restart capabilities in Docker environment
Choose from curated models organized by use case - General Chat, Coding, Reasoning & Math, Small & Efficient, and Multilingual support.
Upload your training data in multiple formats with real-time validation and preview capabilities.
Start, monitor, and control your training jobs with real-time feedback and comprehensive logging.
- GPUs: Dual RTX 3060 (12GB VRAM total) or equivalent
- RAM: 32GB+ recommended
- Storage: 100GB+ free space for models and checkpoints
- OS: Ubuntu Linux 20.04+ with CUDA 12.0+
- Python: 3.10 or higher
- Node.js: 18.0+ (for frontend development)
git clone git@github.com:ddunford/LLMTune.git
cd LLMTunecd backend/
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt# Install Unsloth for 2-5x faster training
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"cd frontend/
npm installEnsure CUDA is properly installed and accessible:
nvidia-smi # Should show your GPUs
python -c "import torch; print(torch.cuda.is_available())" # Should return True
python -c "import torch; print(f'GPUs: {torch.cuda.device_count()}')" # Should show 2 for dual GPUFor easier deployment, environment management, and guaranteed compatibility, Docker is the recommended installation method:
- Docker (v20.10+)
- Docker Compose (v2.0+)
- NVIDIA Docker Runtime
# Clone repository
git clone git@github.com:ddunford/LLMTune.git
cd LLMTune
# Start development environment
./docker-setup.sh dev
# Access the application
# Frontend: http://localhost:55155
# Backend: http://localhost:8001# Development environment
./docker-setup.sh dev
# Production environment
./docker-setup.sh prod
# Stop services
./docker-setup.sh stop
# View logs
./docker-setup.sh logs
# Check status
./docker-setup.sh statusFor detailed Docker setup instructions, see Docker-README.md.
cd backend/
source venv/bin/activate
python main.pycd frontend/
npm run devOpen your browser and navigate to http://localhost:3000
- Choose from curated models organized by use case:
- 💬 General & Chat: Conversation, Q&A, content generation (Llama 3.3, Mistral 7B)
- 💻 Coding: Programming assistance, code generation (DeepSeek Coder, Code Llama)
- 🧮 Reasoning & Math: Complex problem solving (DeepSeek Math, Phi-3)
- ⚡ Small & Efficient: Fast inference, resource-friendly (Phi-3 Mini, Gemma 2)
- 🌍 Multilingual: International language support (Qwen 2.5, BLOOM)
- Or enter a custom Hugging Face model ID
- Verify tokenizer compatibility
- Navigate to the Dataset Management section
- Upload your training data (.jsonl, .csv, or .txt)
- Preview and validate the dataset format
- Select training method (LoRA/QLoRA/Full)
- Enable dual GPU for faster training (automatically handles multi-GPU distribution)
- Adjust parameters:
- Learning rate, epochs, batch size
- LoRA rank, alpha, dropout (for LoRA/QLoRA)
- Precision settings (fp16/bf16 auto-detected)
- View real-time logs and metrics
- Monitor GPU utilization across both GPUs
- Track loss curves and training progress
- Unsloth optimizations show in logs (2-5x speedup notifications)
- Browse completed training runs
- Load checkpoints for inference
- Resume interrupted training sessions
| Feature | Axolotl (Old) | Unsloth (New) |
|---|---|---|
| Dual GPU Support | ❌ Complex setup, device mismatch errors | ✅ Automatic, works out of the box |
| Training Speed | ✅ 2-5x faster | |
| Memory Usage | ❌ Often out-of-memory | ✅ Optimized, fits larger models |
| Setup Complexity | ❌ YAML configs, distributed training | ✅ Simple Python scripts |
| Error Handling | ❌ Cryptic distributed errors | ✅ Clear, actionable errors |
| GPU Utilization | ❌ Single GPU or complex multi-GPU | ✅ Automatic optimal distribution |
| Model Support | ✅ Excellent + optimized versions |
- Training Speed: 2-5x faster than standard implementations
- Memory Efficiency: Reduced memory usage through optimizations
- GPU Utilization: Both RTX 3060s fully utilized automatically
- Model Loading: Faster model initialization and checkpoint loading
- Gradient Processing: Optimized gradient computation and accumulation
The application accepts training data in three formats:
JSON Lines format following the Alpaca/Instruction structure. Each line must be a valid JSON object with:
Required fields:
instruction: The task or question to be performedinput: Additional context (can be empty string"")output: The expected response/answer
Example:
{"instruction": "What is machine learning?", "input": "", "output": "Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed for every task."}
{"instruction": "Translate the following text to French", "input": "Hello, how are you?", "output": "Bonjour, comment allez-vous?"}
{"instruction": "Explain neural networks", "input": "", "output": "Neural networks are computing systems inspired by biological neural networks. They consist of interconnected nodes (neurons) that process information and learn patterns from data."}Comma-separated values with column headers. You can specify which columns contain the instruction and response data during upload.
Expected structure:
instruction,input,output
"What is AI?","","Artificial intelligence is..."
"Explain deep learning","","Deep learning is a subset of machine learning..."
"Summarize this text","Climate change is a pressing issue...","Climate change poses significant challenges..."Plain text format where each line is treated as a separate training example. Best for simple completion tasks.
Example:
The capital of France is Paris.
Python is a programming language.
Machine learning requires data.- Automatic Conversion: All formats are automatically converted to Unsloth-compatible format
- Metadata Extraction: File size, row count, estimated token count
- Sample Preview: First 5 rows displayed for validation
- Format Validation: Real-time validation during upload
- Column Detection: Automatic detection of available columns in CSV files
A sample dataset is included at backend/uploads/sample_dataset.jsonl demonstrating the correct JSONL format. Use this as a reference for structuring your training data.
- Use JSONL format for best compatibility and control
- Keep instructions clear and specific
- Ensure consistent formatting across all examples
- Include diverse examples to improve model generalization
- Validate data quality using the preview feature before training
LLMTune/
├── backend/ # FastAPI backend
│ ├── main.py # Application entry point
│ ├── unsloth_runner.py # Unsloth training orchestration
│ ├── models/ # Data models and schemas
│ ├── routes/ # API endpoints
│ ├── services/ # Business logic
│ ├── uploads/ # User datasets
│ ├── logs/ # Training logs
│ ├── checkpoints/ # Model checkpoints
│ ├── scripts/ # Generated training scripts
│ ├── Dockerfile # Backend container definition
│ └── .dockerignore # Docker build optimization
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/ # UI components
│ │ ├── pages/ # Page components
│ │ ├── hooks/ # Custom React hooks
│ │ ├── services/ # API services
│ │ └── utils/ # Utility functions
│ ├── public/ # Static assets
│ ├── Dockerfile # Frontend container definition
│ ├── nginx.conf # Production nginx configuration
│ └── .dockerignore # Docker build optimization
├── docs/ # Documentation and screenshots
│ ├── train_models.png # Model selection interface
│ ├── train_dataset.png # Dataset upload interface
│ ├── train_start.png # Training control interface
│ └── PRD.md # Product Requirements Document
├── .cursor/ # Cursor IDE rules
│ └── rules/ # Development guidelines
├── docker-compose.yml # Main Docker orchestration
├── docker-compose.dev.yml # Development overrides
├── docker-setup.sh # Docker management script
├── Docker-README.md # Docker setup documentation
├── .env.example # Environment variables template
└── README.md # This file
-
Project setup and structure -
Dataset upload functionality -
Base model selection interface with use case categories -
LoRA training configuration -
Training launch via UI -
Real-time logs and GPU statistics
-
QLoRA and full fine-tune support -
Checkpoint management system -
Inference preview/sandbox -
Unsloth migration for 2-5x faster training -
Dual GPU optimization -
Docker containerization with GPU support -
CORS fixes and API optimization -
Performance improvements and monitoring - Multi-user authentication
-
CORS Resolution: Fixed frontend-backend connectivity issues -
Extended Timeouts: Inference requests now support 2-minute timeout for model loading -
Optimized Polling: Reduced monitoring refresh rate from 2s to 5s for better performance -
Docker Environment: Complete containerization with CUDA support and production-ready setup -
GPU Memory Management: Improved model loading and unloading for dual GPU setups -
WebSocket Configuration: Enhanced real-time communication for monitoring
- Backend: FastAPI with Unsloth integration
- Frontend: React with Tailwind CSS
- Training: Dynamic Python script generation (no YAML configs)
- API Design: RESTful endpoints with WebSocket streaming
- Testing: Unit, integration, and hardware tests
- Performance: Optimized for dual RTX 3060 with Unsloth
# Backend tests
cd backend/
pytest
# Test Unsloth dual GPU functionality
python test_unsloth_simple.py
# Frontend tests
cd frontend/
npm test- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Follow the development guidelines in
.cursor/rules/ - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Python: Follow PEP 8, use Black formatter
- JavaScript: ESLint + Prettier configuration
- Documentation: Update README and docstrings
This project is licensed under the MIT License - see the LICENSE file for details.
- CUDA not detected: Ensure NVIDIA drivers and CUDA toolkit are properly installed
- Docker GPU support: Install nvidia-docker2 for containerized GPU access
- Port conflicts: Default ports are 8001 (backend) and 55155 (frontend dev)
- Out of memory: Unsloth's optimizations should prevent this, but try reducing batch size if needed
- Training not using both GPUs: Enable "Use Dual GPU" in training configuration
- Slow training: Ensure Unsloth is properly installed and check logs for "2x faster" messages
- Inference timeouts: Now supports 2-minute timeout for large model loading
- CORS errors: ✅ Fixed - Frontend now uses Vite proxy for seamless API communication
- API connection issues: ✅ Resolved - Backend-frontend connectivity fully optimized
- Monitoring performance: ✅ Improved - Polling reduced to 5-second intervals
- Check the Issues for common problems
- Review the PRD.md for detailed specifications
- Consult the development guidelines in
.cursor/rules/
- Unsloth for the blazing-fast training backend
- Hugging Face for model and dataset ecosystem
- LoRA and QLoRA research papers
- TRL for the training framework
Note: This project is optimized for dual RTX 3060 GPUs with Unsloth and delivers 2-5x faster training than traditional methods. See the PRD for detailed hardware requirements and optimization strategies.


