A comprehensive web application for managing vLLM (Virtual Large Language Model) instances with HuggingFace integration. This application allows you to easily deploy, manage, and monitor multiple vLLM instances through a modern web interface.
- 🚀 Easy Instance Management: Create, start, stop, restart, and remove vLLM instances
- 🔍 HuggingFace Integration: Search and browse models directly from HuggingFace
- 🔐 Authentication Support: Full support for gated and private models with API keys
- 📊 Real-time Monitoring: Live status updates and container logs
- 🌐 Modern UI: Clean, responsive interface built with React and Tailwind CSS
- 🐳 Docker-based: Containerized deployment for easy setup and scaling
- 📱 Mobile Friendly: Responsive design works on all devices
Search and browse HuggingFace models with detailed information and popularity metrics
View detailed instance information, logs, and manage running containers
Test your vLLM instances with an interactive chat interface
- Frontend: React.js with Tailwind CSS for styling
- Backend: Node.js with Express.js API
- Database: SQLite for instance configuration storage
- Container Management: Docker API integration for vLLM instances
- Model Discovery: HuggingFace API integration
- Docker and Docker Compose v2
- Node.js 18+ (for development)
- At least 4GB RAM for running models
-
Clone the repository
git clone git@github.com:ddunford/vLLMManager.git cd vllm-manager -
Install dependencies
npm run install:all
-
Set up environment variables
cp .env.example .env # Edit .env with your configuration -
Start the development servers
# Terminal 1: Start backend npm run dev # Terminal 2: Start frontend npm run dev:frontend
-
Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:3001
-
Configure environment variables
cp .env.example .env # Edit .env with your production settings -
Build and start with Docker Compose
# For production docker compose -f docker-compose.prod.yml up -d # For development docker compose up -d
-
Access the application
- Application: http://localhost:3001
For detailed production deployment instructions, see DEPLOYMENT.md.
# Build the image
docker build -t vllm-manager .
# Run the container
docker run -d \
-p 3001:3001 \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ./server/data:/app/server/data \
--name vllm-manager \
vllm-manager- Navigate to the Create Instance page
- Enter an instance name and model name (e.g.,
microsoft/DialoGPT-medium) - Optionally provide a HuggingFace API key for gated models
- Click Create Instance
Use the model discovery interface to search and browse available models:
- Search: Find models by name, description, or tags
- Popular Models: Browse trending and most downloaded models
- Model Details: View comprehensive information including parameters, license, and usage examples
- Direct Integration: Click any model to use it for creating a new instance
- Dashboard: View all instances with their status and basic controls
- Instance Details: Click on any instance to view logs, detailed information, and API usage examples
- Actions: Start, stop, restart, or remove instances directly from the dashboard
The instance detail page provides:
- Real-time Logs: Monitor container output and debug issues
- Status Information: Current state, port assignments, and resource usage
- API Examples: Copy-paste ready code examples for different programming languages
- Configuration Details: View model parameters and container settings
Use the built-in testing interface to verify your vLLM instances:
- Interactive Chat: Test conversational models with a chat interface
- API Testing: Send custom requests and view responses
- Response Analysis: Examine model outputs and performance metrics
- Error Diagnosis: Debug connection and model issues
Once an instance is running, you can access the OpenAI-compatible API:
curl -X POST http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer localkey" \
-d '{
"model": "your-model-name",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'GET /api/containers- List all instancesPOST /api/containers- Create new instancePOST /api/containers/:id/start- Start instancePOST /api/containers/:id/stop- Stop instancePOST /api/containers/:id/restart- Restart instanceDELETE /api/containers/:id- Remove instanceGET /api/containers/:id/logs- Get container logs
GET /api/models/search?query=<query>- Search HuggingFace modelsGET /api/models/popular- Get popular modelsGET /api/models/:modelId- Get model detailsPOST /api/models/validate- Validate model access
| Variable | Description | Default |
|---|---|---|
PORT |
Server port | 3001 |
NODE_ENV |
Environment | development |
HF_TOKEN |
HuggingFace API token | - |
MIN_PORT |
Minimum port for instances | 8001 |
MAX_PORT |
Maximum port for instances | 9000 |
The application supports any HuggingFace model compatible with vLLM:
- Text Generation: GPT-style models, LLaMA, Mistral, etc.
- Conversational: ChatGPT-style models
- Code Generation: CodeLLaMA, CodeT5, etc.
- Minimum: 4GB RAM, 2 CPU cores
- Recommended: 8GB+ RAM, 4+ CPU cores
- Storage: 10GB+ for model caching
-
Port already in use
- The application automatically assigns available ports
- Check if other services are using the port range (8001-9000)
-
Model download fails
- Ensure internet connectivity
- Check if the model requires authentication
- Verify HuggingFace API key for gated models
-
Container creation fails
- Ensure Docker daemon is running
- Check Docker socket permissions
- Verify available disk space
- Application logs:
docker compose logs vllm-manager - Instance logs: Available through the web interface
- Container logs:
docker logs <container-name>
vllm-manager/
├── server/ # Backend API
│ ├── routes/ # API routes
│ ├── services/ # Business logic
│ ├── middleware/ # Security and logging middleware
│ ├── database/ # Database management
│ └── tests/ # Backend tests
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/ # Reusable components
│ │ ├── pages/ # Page components
│ │ └── services/ # API clients
│ └── public/
├── .github/ # GitHub Actions CI/CD
├── docker-compose.yml # Development Docker configuration
├── docker-compose.prod.yml # Production Docker configuration
└── Dockerfile # Container build instructions
# Start development servers
npm run dev # Backend with auto-reload
npm run dev:frontend # Frontend development server
# Testing
npm test # Run backend tests
npm run test:coverage # Run tests with coverage
npm run test:watch # Watch mode for tests
# Code quality
npm run lint # Run ESLint
npm run lint:fix # Fix linting issues
npm run format # Format code with Prettier
# Docker
npm run docker:up # Start development containers
npm run docker:down # Stop containers
npm run docker:prod # Start production containers- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Run linting and formatting:
npm run lint:fix && npm run format - Ensure all tests pass:
npm test - Submit a pull request
The project includes:
- ✅ Automated Testing: Jest test suite with coverage reporting
- ✅ Code Quality: ESLint and Prettier for consistent code style
- ✅ CI/CD Pipeline: GitHub Actions for automated testing and deployment
- ✅ Security Scanning: Automated vulnerability scanning in CI
- ✅ Docker Support: Multi-stage builds with security best practices
- Backend routes: Add to
server/routes/ - Frontend pages: Add to
frontend/src/pages/ - UI components: Add to
frontend/src/components/ - API services: Add to
server/services/
- ✅ Production-Ready Security: Comprehensive security middleware with Helmet.js
- ✅ Rate Limiting: Protection against DoS attacks and API abuse
- ✅ Environment Variables: All sensitive data configured via environment variables
- ✅ Container Security: Non-root user and security options enabled
- ✅ Security Headers: CORS, CSP, HSTS, and other security headers configured
- ✅ Input Validation: Server-side validation of all user inputs
- ✅ Security Logging: Monitoring and logging of suspicious activities
- ✅ Container Isolation: Docker networks and security options prevent interference
- ✅ Regular Updates: Automated dependency scanning and security updates
See SECURITY.md for detailed security documentation.
- Use smaller models for testing and development
- Monitor resource usage through the dashboard
- Scale horizontally by running multiple instances
- Use SSD storage for better model loading performance
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check the troubleshooting section
- Review container logs
- Open an issue on GitHub