GitHub - Buckeyes1995/hexllama: A beautifully crafted desktop client for running and managing local LLMs via llama.cpp.

Hexllama is a fast, native desktop interface designed to streamline managing and running local Large Language Models using llama.cpp. It strips away the friction of command-line execution and manual file management, providing a unified workspace to discover, download, configure, and serve models.

Built by and for local AI enthusiasts, Hexllama ensures you spend less time wrestling with terminal arguments and more time interacting with models.

Features

Integrated Model Hub Search Hugging Face directly within the application. Browse repositories, view file details, and download GGUF models with a single click without ever opening a browser.

Smart Download Manager Pause, resume, or cancel large model downloads reliably. You can also paste direct GGUF links. When a download completes, Hexllama automatically generates an execution template with recommended threads, batch sizes, and context windows tailored to the model's parameters and quantization level.

Template-Based Execution Save your configurations as reusable templates. Run multiple models simultaneously on different ports without conflict. Launch them in "Chat UI" mode to automatically open the built-in llama.cpp web interface, or "API Only" mode to serve them silently in the background.

Version and Backend Management Running cutting-edge models sometimes requires different builds of llama.cpp. Hexllama lets you maintain and seamlessly switch between multiple backend binaries. It automatically checks the ggml-org repository for new releases and lets you download and extract them straight from the settings panel.

Visual Command Editor Stop memorizing execution flags. Edit backend-specific commands through a structured user interface. Toggle booleans, set limits on numerical inputs, and define default parameter values for the llama.cpp server.

Installation

Download the Release

The fastest way to get started is to use the pre-compiled installer.

Navigate to the Releases page.
Download the installer for your operating system.
Run the installer and launch Hexllama.

Run Locally

If you want to build from source or modify the project, you can easily run the development environment.

Prerequisites:

Node.js 18 or higher
npm

# Clone the repository
git clone https://github.com/andersondanieln/hexllama.git

# Enter the project directory
cd hexllama

# Install dependencies
npm install

# Start the development server
npm run dev

To compile the application into an executable for your current OS:

npm run build

Roadmap

Phase 1: Core Foundation (Completed)

Integrated Model Hub: Hugging Face search & download direct from the app.
Smart Download Manager: Pause/resume/cancel, auto-template generation based on hardware & quant level.
Template-Based Execution: Run multiple models on different ports, reusable configuration templates.
Version and Backend Management: Download and switch between different versions of llama.cpp binaries directly.
Visual Command Editor: Graphical UI for configuring server parameters instead of terminal flags.

Phase 2: Enhanced Inference & Native UI (Short to Mid-Term)

Built-in Chat Interface: Native chat client to interact with models directly within Hexllama without launching external browser tabs.
MTP (Multi-Token Prediction) Support: Enable faster generation speeds using speculative decoding / MTP.
TurboQuant Support: Support optimized quantizations and execution configurations.
Multi-Language Support: Complete internationalization (i18n) to support Portuguese, English, Spanish, etc.

Phase 3: Multi-Backend & Advanced Engines (Long-Term)

Alternative Backend Integration: Expand support beyond llama.cpp to include:
- MLX: Native backend for Apple Silicon optimized performance.
- vLLM / ExLlamaV2: Support for high-throughput and GPU-optimized engines.

Acknowledgements

This project exists because of the incredible foundational work of Georgi Gerganov and the ggml-org community. Please consider supporting the development of llama.cpp.

Privacy and Terms

Hexllama is provided as is, without warranty of any kind. The developers assume no liability for damages or issues arising from the use of this software.

This application is strictly local. It does not collect, store, or transmit any telemetry or personal data. Note that downloading models relies on third-party services like Hugging Face, and executing backends relies on the downloaded binaries, both of which are subject to their own respective privacy policies.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
models		models
resources		resources
src		src
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
electron-builder.yml		electron-builder.yml
electron.vite.config.ts		electron.vite.config.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.node.tsbuildinfo		tsconfig.node.tsbuildinfo
tsconfig.web.json		tsconfig.web.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Installation

Download the Release

Run Locally

Roadmap

Phase 1: Core Foundation (Completed)

Phase 2: Enhanced Inference & Native UI (Short to Mid-Term)

Phase 3: Multi-Backend & Advanced Engines (Long-Term)

Acknowledgements

Privacy and Terms

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Features

Installation

Download the Release

Run Locally

Roadmap

Phase 1: Core Foundation (Completed)

Phase 2: Enhanced Inference & Native UI (Short to Mid-Term)

Phase 3: Multi-Backend & Advanced Engines (Long-Term)

Acknowledgements

Privacy and Terms

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages