awesome-local-ai

Curated list of AI tools that run 100% on your machine — no cloud, no telemetry, no "local-ish" setups that secretly phone home.

Maintained by Brethof AI. Companion to awesome-llms-txt and awesome-private-ai.

Why this list exists

The phrase "local-first" has been stretched to mean almost anything in 2026. Lots of tools advertise on-device AI but then call out to remote APIs for "complex" prompts, sync your data to a "private" cloud bucket, or ship telemetry that re-introduces every privacy problem the local mode was meant to solve.

This list applies a strict rule: the tool must perform inference on hardware you own and not transmit prompts, embeddings, or audio off that hardware during normal use. Optional cloud features (like model download or update checks) are fine; mandatory ones disqualify.

Where a tool is partially-local (e.g. ships a fully-local mode but defaults to cloud), we say so explicitly. Every link is checked — a CI-style sweep cuts entries whose URL 404s.

Inclusion rules

To be listed:

Inference happens on the user's CPU, GPU, NPU, or local accelerator.
No mandatory account creation just to use the offline mode.
Source code or signed binaries available — verifiable provenance.
Maintained: a release, commit, or PR merge in the last 6 months.
Real artefact (not a "coming soon" landing page).

Legend

🐧 Linux · 🪟 Windows · 🍎 macOS · 📱 iOS / Android · 🌐 Web (in-browser)
🔓 open source · 🔒 closed source
🆓 free for personal · 💰 paid · 🆓💰 free + paid tiers
🐍 Python · 🦀 Rust · 🐹 Go · ⚙️ C/C++ · 🟦 TypeScript / JS

Inference Runtimes (13)
Desktop Chat Apps (4)
Voice — Speech-to-Text (7)
Voice — Text-to-Speech (6)
Image Generation (7)
Video Generation (2)
Code Assistants (5)
Local Agents (3)
Vector Databases (6)
Embeddings (3)
Training & Fine-tuning (5)
Local Search & RAG (5)
Operating Systems Tuned for AI (5)
Hardware-Specific Runtimes (4)

Inference Runtimes

Engines that load LLMs, vision models, and other neural networks for inference on your hardware.

ExLlamaV2 — 🐧 🪟 🔓 🆓 🐍
Fast inference for quantised LLMs on consumer NVIDIA GPUs. EXL2 format outperforms GGUF on the same hardware in many benchmarks.
GPT4All — 🐧 🪟 🍎 🔓 🆓 ⚙️
Privacy-first desktop chat with curated quantised models. Strong CPU performance.
Jan — 🐧 🪟 🍎 🔓 🆓 🟦
Open-source ChatGPT alternative. Bundles llama.cpp + a clean UI.
KoboldCpp — 🐧 🪟 🍎 🔓 🆓 ⚙️
Single-binary llama.cpp wrapper with KoboldAI UI for chat, story-writing, RP.
llama.cpp — 🐧 🪟 🍎 🔓 ⚙️
Reference C++ implementation for running LLaMA-family and other transformer models with GGUF quantization. Powers most of the others in this section.
LM Studio — 🐧 🪟 🍎 🔒 🆓 ⚙️
Polished desktop app for discovering, downloading, and running local LLMs. OpenAI-compatible server mode. Free for personal + commercial.
LocalAI — 🐧 🪟 🍎 🔓 🆓 🐹
Self-hosted, OpenAI-compatible inference server. Text, image, audio, embeddings — all on your machine.
Mistral.rs — 🐧 🪟 🍎 🔓 🆓 🦀
Rust LLM inference platform with quantization, vision, MoE, and speculative decoding.
MLC LLM — 🐧 🪟 🍎 📱 🌐 🔓 🆓 🐍
Compile-once, deploy-anywhere LLM runtime. Targets WebGPU, Vulkan, CUDA, Metal, iOS, and Android from a single source.
Ollama — 🐧 🪟 🍎 🔓 🆓 🐹
Single-binary server with a built-in model library. Pull, run, and swap models with one command.
SGLang — 🐧 🔓 🆓 🐍
Fast LLM and VLM serving runtime with RadixAttention cache and structured-output support.
Text Generation WebUI — 🐧 🪟 🍎 🔓 🆓 🐍
Gradio-based web UI for local LLMs. Supports GGUF, GPTQ, AWQ, EXL2.
vLLM — 🐧 🔓 🆓 🐍
High-throughput inference engine with PagedAttention. Designed for serving, not desktop chat — pair with Open WebUI or LiteLLM.

Desktop Chat Apps

GUI applications wrapping a local runtime in a chat interface.

Anything LLM — 🐧 🪟 🍎 🔓 🆓 🟦
Workspace-style chat with built-in RAG. Works fully offline with a local LLM provider.
Faraday — 🐧 🪟 🍎 🔒 🆓 ⚙️
Local-only character / role-play chat. Bundles inference, no API key needed.
Msty — 🐧 🪟 🍎 🔒 🆓 🟦
Fast desktop chat with branching conversations and parallel-model comparison. Free tier covers personal local use.
Open WebUI — 🐧 🪟 🍎 🌐 🔓 🆓 🐍
Self-hosted "ChatGPT clone" of the open-source world. Pair with Ollama or any OpenAI-compatible local server.

Voice — Speech-to-Text

Brethof Voice Pro — 🐧 🪟 🔒 🆓 💰 ⚙️
Desktop dictation app built on Qwen3-ASR + GGUF + llama.cpp. 36 languages, hotkey-anywhere transcription, file/microphone/system-audio input, LoRA personal voice training. 100% offline, no account required to transcribe. Disclosure: maintained by us.
faster-whisper — 🐧 🪟 🍎 🔓 🆓 🐍
CTranslate2-based reimplementation. ~4× faster than reference Whisper at the same accuracy.
OpenAI Whisper — 🐧 🪟 🍎 🔓 🆓 🐍
Reference Python implementation. Accurate but slower than the C++ ports; useful when you need the exact research behaviour.
RealtimeSTT — 🐧 🪟 🍎 🔓 🆓 🐍
Low-latency streaming wrapper around faster-whisper for live dictation pipelines.
Vosk — 🐧 🪟 🍎 📱 🔓 🆓 🐍
Lightweight offline speech recognizer with 20+ language models. Real-time on CPU.
Whisper.cpp — 🐧 🪟 🍎 📱 🔓 🆓 ⚙️
C++ port of OpenAI Whisper with GGUF quantization. Runs on CPU, Metal, CUDA, Vulkan.
WhisperX — 🐧 🪟 🍎 🔓 🆓 🐍
faster-whisper plus forced alignment, voice-activity detection, and speaker diarization.

Voice — Text-to-Speech

Bark — 🐧 🪟 🍎 🔓 🆓 🐍
Multilingual generative audio. Speech, sound effects, and music cues from text prompts.
Coqui TTS — 🐧 🪟 🍎 🔓 🆓 🐍
Comprehensive TTS toolkit. Multiple architectures (Tacotron, VITS, XTTS) and voice cloning.
Kokoro — 🐧 🪟 🍎 🔓 🆓 🐍
Tiny ~80M-param TTS model, surprisingly natural for the size. Suitable for low-end hardware.
Mimic 3 — 🐧 🪟 🍎 📱 🔓 🆓 🐍
Mycroft's neural TTS engine. Lightweight, multilingual.
Piper — 🐧 🪟 🍎 📱 🔓 🆓 ⚙️
Fast neural TTS. ONNX runtime, dozens of voices and languages. Designed for Raspberry Pi-class hardware.
StyleTTS 2 — 🐧 🪟 🍎 🔓 🆓 🐍
High-fidelity expressive TTS with style transfer. Strong reference voice cloning.

Image Generation

AUTOMATIC1111 / Stable Diffusion WebUI — 🐧 🪟 🍎 🔓 🆓 🐍
The original ergonomic SD UI. Heavy plugin ecosystem.
ComfyUI — 🐧 🪟 🍎 🔓 🆓 🐍
Node-graph workflow editor for diffusion models. Powers most modern local image and video pipelines.
Fooocus — 🐧 🪟 🍎 🔓 🆓 🐍
Image generator with sane defaults — minimal knobs for great results. Built on top of Stable Diffusion.
Forge — 🐧 🪟 🍎 🔓 🆓 🐍
Performance-tuned A1111 fork by lllyasviel. Lower VRAM, faster on modern GPUs.
InvokeAI — 🐧 🪟 🍎 🔓 🔒 🆓 💰 🐍
Pro-grade SD UI with strong canvas / inpainting tools. Enterprise tier; free local install remains open source.
SD.Next — 🐧 🪟 🍎 🔓 🆓 🐍
All-in-one fork of A1111 with broader backend support (Diffusers, ONNX, ROCm).
SwarmUI — 🐧 🪟 🍎 🔓 🆓 🟦
Modular UI built on top of ComfyUI. User-friendly mode out of the box, full node-graph available when you need it.

Video Generation

ComfyUI + LTX Video — 🐧 🪟 🍎 🔓 🆓 🐍
ComfyUI nodes drive Lightricks LTX video models for text-to-video and image-to-video generation. The chunked-loop pattern (released in our comfyui-workflows) produces longer outputs than vanilla LTX allows.
Wan2GP — 🐧 🪟 🍎 🔓 🆓 🐍
Stripped-down Wan2.2 video pipeline for low-VRAM consumer GPUs.

Code Assistants

Aider — 🐧 🪟 🍎 🔓 🆓 🐍
Terminal pair-programming. Bring-your-own-LLM via LiteLLM — run with Ollama or any OpenAI-compatible local endpoint.
Continue — 🐧 🪟 🍎 🔓 🆓 🟦
IDE assistant with first-class local-LLM support. Defaults can be set to Ollama / LM Studio. VS Code + JetBrains.
Llama.vim — 🐧 🪟 🍎 🔓 🆓 ⚙️
Vim plugin that streams llama.cpp completions inline. No cloud.
Tabby — 🐧 🪟 🍎 🔓 🆓 🦀
Self-hosted GitHub Copilot alternative. Local model serving with IDE plugins.
twinny — 🐧 🪟 🍎 🔓 🆓 🟦
Free local AI extension for VS Code. Chat + autocomplete via Ollama.

Local Agents

Aider in /architect mode — 🐧 🪟 🍎 🔓 🆓 🐍
Aider's planning mode separates "decide" and "edit" steps; works well with strong local reasoning models.
Continue Agent mode — 🐧 🪟 🍎 🔓 🆓 🟦
Agentic editing flow inside Continue. Pair with a local model for fully-offline coding agents.
Open Interpreter — 🐧 🪟 🍎 🔓 🆓 🐍
Code-execution agent that runs Python/shell on your machine. Local-LLM friendly.

Vector Databases

Chroma — 🐧 🪟 🍎 🔓 🆓 🐍
Embedding database designed for local-first usage. SQLite-style single-file or client/server.
Faiss — 🐧 🪟 🍎 🔓 🆓 ⚙️
Library for similarity search. The retrieval engine inside many of the others.
LanceDB — 🐧 🪟 🍎 🔓 🆓 🦀
Embedded, columnar vector DB. Single-file, no server.
Marqo — 🐧 🪟 🍎 🔓 🔒 🆓 💰 🐍
End-to-end vector search; OSS core, paid hosted version.
Qdrant — 🐧 🪟 🍎 🔓 🆓 🦀
High-performance vector DB. Self-host the open-source binary.
Weaviate — 🐧 🪟 🍎 🔓 🆓 🐹
Hybrid (vector + keyword) DB. Self-host the OSS distribution; cloud is optional.

Embeddings

BGE — 🐧 🪟 🍎 🔓 🆓 🐍
BAAI's BGE family. Strong English + multilingual variants. Run via llama.cpp, sentence-transformers, or fastembed.
fastembed — 🐧 🪟 🍎 🔓 🆓 🐍
Lightweight CPU-friendly embedding library by Qdrant.
Sentence Transformers — 🐧 🪟 🍎 🔓 🆓 🐍
Reference Python library for sentence + paragraph embeddings.

Training & Fine-tuning

Axolotl — 🐧 🔓 🆓 🐍
Config-driven fine-tuning framework. LoRA, QLoRA, full fine-tunes.
diffusion-pipe — 🐧 🔓 🆓 🐍
Pipeline-parallel trainer for diffusion models. Multi-GPU LoRA on large image / video models.
MLX — 🍎 🔓 🆓 🐍
Apple's native ML framework for Apple Silicon. Train and infer on M-series Macs without CUDA workarounds.
Ostris ai-toolkit — 🐧 🪟 🔓 🆓 🐍
LoRA training UI for Flux, SD3, SDXL, LTX. Works on consumer hardware.
Unsloth — 🐧 🪟 🍎 🔓 🆓 🐍
Fine-tune LLMs 2× faster with 70% less VRAM than reference HuggingFace pipelines.

Local Search & RAG

Anything LLM — 🐧 🪟 🍎 🔓 🆓 🟦
Self-hosted workspace tool with integrated RAG. Listed twice intentionally — strong both as a chat app and a RAG layer.
LlamaIndex — 🐧 🪟 🍎 🔓 🆓 🐍
Toolkit for building RAG pipelines. Works fully offline with local models + vector DBs.
Perplexica — 🐧 🪟 🍎 🔓 🆓 🟦
Open-source AI search powered by SearXNG + your local LLM.
PrivateGPT — 🐧 🪟 🍎 🔓 🆓 🐍
Ingest documents locally and query them with an offline LLM.
SearXNG — 🐧 🪟 🍎 🔓 🆓 🐍
Self-hosted meta-search engine. Pair with a local LLM for an offline Perplexity-style assistant.

Operating Systems Tuned for AI

Bazzite — 🐧 🔓 🆓
Container-native gaming and AI distro. Steam Deck-friendly, latest drivers, easy CUDA.
Bluefin — 🐧 🔓 🆓
Fedora-based, atomic, container-first. Good "drop you in a known state" workstation for AI work.
CachyOS — 🐧 🔓 🆓
Arch-based desktop distro with a tuned kernel and recent NVIDIA / AMD drivers. Sane out-of-the-box for new GPUs (Blackwell, RDNA 4).
NixOS — 🐧 🔓 🆓
Reproducible system config. Best when you need identical CUDA + ML toolchain across machines.
Pop!_OS — 🐧 🔓 🆓
System76's NVIDIA-friendly desktop distro. ISO ships with proprietary drivers for plug-and-play GPU work.

Hardware-Specific Runtimes

MLX — 🍎 🔓 🆓 🐍
Apple Silicon-native ML library. Already listed under training; it also ships an inference runtime competitive with llama.cpp on M-series.
NVIDIA TensorRT-LLM — 🐧 🪟 🔒 🆓 🐍
NVIDIA's optimised LLM runtime for their data-center and consumer GPUs. Closed-weights binary; fastest CUDA path for many models.
OpenVINO — 🐧 🪟 🍎 🔓 🆓 ⚙️
Intel's inference toolkit. CPU, iGPU, dGPU (Arc), and NPU support for Intel laptops.
ROCm + llama.cpp HIP — 🐧 🪟 🍎 🔓 ⚙️
AMD GPU inference path. Llama.cpp's HIP backend now reaches CUDA parity on RDNA 3/4 in many workloads.

Related work

awesome-llms-txt — Tools that publish llms.txt for agent discovery.
awesome-private-ai — Privacy-first AI more broadly (some on this list, plus privacy-respecting cloud).
awesome-mcp-servers — MCP servers, many of which sit happily next to a local LLM.
awesome-ai-minefield — License + ToS analysis for the models you'll run locally.
awesome-linux-for-ai — Linux distros tuned for the AI workstations these tools live on.
comfyui-workflows — Curated, working ComfyUI workflows for local image / video generation.

Contributing

Open an issue with the tool name, repo or homepage URL, the category it should land in, and one paragraph on why it's worth listing. Entries live as one YAML file each under entries/; this README is generated from them, so edit the YAML, not the list above. We do not list tools whose offline mode is gated behind a paid plan.

License

MIT.

Maintained by Brethof AI — AI tools built for people who take their data seriously.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
entries		entries
AUTHORS.md		AUTHORS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.categories.yaml		README.categories.yaml
README.foot.md		README.foot.md
README.head.md		README.head.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-local-ai

Why this list exists

Inclusion rules

Legend

Contents

Inference Runtimes

Desktop Chat Apps

Voice — Speech-to-Text

Voice — Text-to-Speech

Image Generation

Video Generation

Code Assistants

Local Agents

Vector Databases

Embeddings

Training & Fine-tuning

Local Search & RAG

Operating Systems Tuned for AI

Hardware-Specific Runtimes

Related work

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

awesome-local-ai

Why this list exists

Inclusion rules

Legend

Contents

Inference Runtimes

Desktop Chat Apps

Voice — Speech-to-Text

Voice — Text-to-Speech

Image Generation

Video Generation

Code Assistants

Local Agents

Vector Databases

Embeddings

Training & Fine-tuning

Local Search & RAG

Operating Systems Tuned for AI

Hardware-Specific Runtimes

Related work

Contributing

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages