Skip to content

Buckeyes1995/hexllama

 
 

Repository files navigation

Hexllama Logo

Latest Version Electron React TypeScript Vite


Hexllama is a fast, native desktop interface designed to streamline managing and running local Large Language Models using llama.cpp. It strips away the friction of command-line execution and manual file management, providing a unified workspace to discover, download, configure, and serve models.

Built by and for local AI enthusiasts, Hexllama ensures you spend less time wrestling with terminal arguments and more time interacting with models.

Features

Integrated Model Hub Search Hugging Face directly within the application. Browse repositories, view file details, and download GGUF models with a single click without ever opening a browser.

Model Hub

Smart Download Manager Pause, resume, or cancel large model downloads reliably. You can also paste direct GGUF links. When a download completes, Hexllama automatically generates an execution template with recommended threads, batch sizes, and context windows tailored to the model's parameters and quantization level.

Model Download

Template-Based Execution Save your configurations as reusable templates. Run multiple models simultaneously on different ports without conflict. Launch them in "Chat UI" mode to automatically open the built-in llama.cpp web interface, or "API Only" mode to serve them silently in the background.

My Templates

Template Settings

Version and Backend Management Running cutting-edge models sometimes requires different builds of llama.cpp. Hexllama lets you maintain and seamlessly switch between multiple backend binaries. It automatically checks the ggml-org repository for new releases and lets you download and extract them straight from the settings panel.

Visual Command Editor Stop memorizing execution flags. Edit backend-specific commands through a structured user interface. Toggle booleans, set limits on numerical inputs, and define default parameter values for the llama.cpp server.

Settings

Installation

Download the Release

The fastest way to get started is to use the pre-compiled installer.

  1. Navigate to the Releases page.
  2. Download the installer for your operating system.
  3. Run the installer and launch Hexllama.

Run Locally

If you want to build from source or modify the project, you can easily run the development environment.

Prerequisites:

  • Node.js 18 or higher
  • npm
# Clone the repository
git clone https://github.com/andersondanieln/hexllama.git

# Enter the project directory
cd hexllama

# Install dependencies
npm install

# Start the development server
npm run dev

To compile the application into an executable for your current OS:

npm run build

Roadmap

Phase 1: Core Foundation (Completed)

  • Integrated Model Hub: Hugging Face search & download direct from the app.
  • Smart Download Manager: Pause/resume/cancel, auto-template generation based on hardware & quant level.
  • Template-Based Execution: Run multiple models on different ports, reusable configuration templates.
  • Version and Backend Management: Download and switch between different versions of llama.cpp binaries directly.
  • Visual Command Editor: Graphical UI for configuring server parameters instead of terminal flags.

Phase 2: Enhanced Inference & Native UI (Short to Mid-Term)

  • Built-in Chat Interface: Native chat client to interact with models directly within Hexllama without launching external browser tabs.
  • MTP (Multi-Token Prediction) Support: Enable faster generation speeds using speculative decoding / MTP.
  • TurboQuant Support: Support optimized quantizations and execution configurations.
  • Multi-Language Support: Complete internationalization (i18n) to support Portuguese, English, Spanish, etc.

Phase 3: Multi-Backend & Advanced Engines (Long-Term)

  • Alternative Backend Integration: Expand support beyond llama.cpp to include:
    • MLX: Native backend for Apple Silicon optimized performance.
    • vLLM / ExLlamaV2: Support for high-throughput and GPU-optimized engines.

Acknowledgements

This project exists because of the incredible foundational work of Georgi Gerganov and the ggml-org community. Please consider supporting the development of llama.cpp.

Privacy and Terms

Hexllama is provided as is, without warranty of any kind. The developers assume no liability for damages or issues arising from the use of this software.

This application is strictly local. It does not collect, store, or transmit any telemetry or personal data. Note that downloading models relies on third-party services like Hugging Face, and executing backends relies on the downloaded binaries, both of which are subject to their own respective privacy policies.

About

A beautifully crafted desktop client for running and managing local LLMs via llama.cpp.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • TypeScript 85.2%
  • CSS 14.4%
  • HTML 0.4%