Hexllama is a fast, native desktop interface designed to streamline managing and running local Large Language Models using llama.cpp. It strips away the friction of command-line execution and manual file management, providing a unified workspace to discover, download, configure, and serve models.
Built by and for local AI enthusiasts, Hexllama ensures you spend less time wrestling with terminal arguments and more time interacting with models.
Integrated Model Hub Search Hugging Face directly within the application. Browse repositories, view file details, and download GGUF models with a single click without ever opening a browser.
Smart Download Manager Pause, resume, or cancel large model downloads reliably. You can also paste direct GGUF links. When a download completes, Hexllama automatically generates an execution template with recommended threads, batch sizes, and context windows tailored to the model's parameters and quantization level.
Template-Based Execution Save your configurations as reusable templates. Run multiple models simultaneously on different ports without conflict. Launch them in "Chat UI" mode to automatically open the built-in llama.cpp web interface, or "API Only" mode to serve them silently in the background.
Version and Backend Management Running cutting-edge models sometimes requires different builds of llama.cpp. Hexllama lets you maintain and seamlessly switch between multiple backend binaries. It automatically checks the ggml-org repository for new releases and lets you download and extract them straight from the settings panel.
Visual Command Editor Stop memorizing execution flags. Edit backend-specific commands through a structured user interface. Toggle booleans, set limits on numerical inputs, and define default parameter values for the llama.cpp server.
The fastest way to get started is to use the pre-compiled installer.
- Navigate to the Releases page.
- Download the installer for your operating system.
- Run the installer and launch Hexllama.
If you want to build from source or modify the project, you can easily run the development environment.
Prerequisites:
- Node.js 18 or higher
- npm
# Clone the repository
git clone https://github.com/andersondanieln/hexllama.git
# Enter the project directory
cd hexllama
# Install dependencies
npm install
# Start the development server
npm run devTo compile the application into an executable for your current OS:
npm run build- Integrated Model Hub: Hugging Face search & download direct from the app.
- Smart Download Manager: Pause/resume/cancel, auto-template generation based on hardware & quant level.
- Template-Based Execution: Run multiple models on different ports, reusable configuration templates.
- Version and Backend Management: Download and switch between different versions of
llama.cppbinaries directly. - Visual Command Editor: Graphical UI for configuring server parameters instead of terminal flags.
- Built-in Chat Interface: Native chat client to interact with models directly within Hexllama without launching external browser tabs.
- MTP (Multi-Token Prediction) Support: Enable faster generation speeds using speculative decoding / MTP.
- TurboQuant Support: Support optimized quantizations and execution configurations.
- Multi-Language Support: Complete internationalization (i18n) to support Portuguese, English, Spanish, etc.
- Alternative Backend Integration: Expand support beyond
llama.cppto include:- MLX: Native backend for Apple Silicon optimized performance.
- vLLM / ExLlamaV2: Support for high-throughput and GPU-optimized engines.
This project exists because of the incredible foundational work of Georgi Gerganov and the ggml-org community. Please consider supporting the development of llama.cpp.
Hexllama is provided as is, without warranty of any kind. The developers assume no liability for damages or issues arising from the use of this software.
This application is strictly local. It does not collect, store, or transmit any telemetry or personal data. Note that downloading models relies on third-party services like Hugging Face, and executing backends relies on the downloaded binaries, both of which are subject to their own respective privacy policies.





