Esh is a local-first LLM tool for Apple Silicon.
It gives you:
- local model install and management
- passive engine detection for MLX and llama.cpp
- interactive terminal chat
- stable JSON commands for external callers
- saved sessions
- backend-native execution cache export/import
- TurboQuant cache compression for MLX
- self-contained release packaging
Today, Esh is a macOS-focused local model orchestrator with MLX and GGUF/llama.cpp backends. It manages, validates, selects, and routes existing runtimes rather than implementing model kernels itself.
Durable engineering notes live in:
Esh is designed for people who want a local chat tool that is:
- fast to run from terminal
- practical for repeated conversations
- honest about model and cache compatibility
- ready to grow into more backends later
This is a text-chat tool in v1.
It does not yet do:
- document ingestion
- codebase indexing
- embeddings or RAG
- multimodal chat
- in-tool model install directly from arbitrary search results outside local installs and Hugging Face
For end users on macOS, the one-line install and run command is:
brew tap fil-technology/tap && brew install --cask esh && eshIf brew is not installed yet, install Homebrew first:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"Then install and run Esh:
brew tap fil-technology/tap && brew install --cask esh && eshIf you prefer the steps split out:
brew tap fil-technology/tap
brew install --cask esh
eshUpgrade later with:
brew upgrade --cask eshIf you previously tried the older formula-based install, remove it first:
brew uninstall esh
brew install --cask eshBootstrap once:
./scripts/bootstrap.shThen use the stable launcher:
./esh
./esh doctor
./esh engines list
./esh model list
./esh chatRunning ./esh with no command opens a default interactive launcher menu with the most common actions.
Esh reads ~/.esh/config.toml when present. Create or inspect it with:
./esh config init
./esh config show
./esh config pathInspect required and optional engines:
./esh doctor
./esh engines list
./esh engines doctor llama.cpp
./esh engines doctor mlxllama-cli is detected passively from ESH_LLAMA_CPP_CLI, LLAMA_CPP_CLI, Homebrew paths, or PATH; Esh does not install llama.cpp automatically. Validate local model files before routing them:
./esh validate /path/to/model.gguf --engine llama.cpp
./esh validate /path/to/mlx-model --engine mlx --jsonUse esh capabilities to get a JSON map of supported backends, installed models, and whether each path supports direct inference, cache build, and cache load. Internally, backends also expose capability reports for runtime readiness and feature support; MLX currently reports direct inference, token streaming, and prompt cache build/load, while llama.cpp reports direct inference and token streaming with GGUF cache features marked unavailable.
Use esh infer for machine-friendly inference. It returns JSON for both MLX and GGUF models, and MLX cache load stays optional rather than being the only supported integration path.
./esh capabilities
cat <<'JSON' | ./esh infer --input -
{
"schemaVersion": "esh.infer.request.v1",
"model": "mlx-community--qwen2.5-0.5b-instruct-4bit",
"messages": [
{ "role": "user", "text": "Say hello in one sentence." }
],
"generation": {
"maxTokens": 64,
"temperature": 0.2
}
}
JSONUse esh serve to expose a local OpenAI-compatible HTTP surface for editors, scripts, and desktop apps.
./esh serve --host 127.0.0.1 --port 11435
curl http://127.0.0.1:11435/v1/models
curl http://127.0.0.1:11435/v1/audio/models
curl http://127.0.0.1:11435/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "pocket-tts",
"input": "Hello from esh",
"voice": "alba"
}' \
--output hello.wav
curl http://127.0.0.1:11435/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "mlx-community--qwen2.5-0.5b-instruct-4bit",
"messages": [
{ "role": "user", "content": "Say hello in one sentence." }
]
}'Supported routes in v1:
GET /healthGET /v1/modelsGET /v1/toolsGET /v1/audio/modelsGET /api/tagsPOST /v1/audio/speechPOST /v1/chat/completionsPOST /v1/responses
Notes:
- unsupported request fields are ignored when safe
stream: trueis supported for OpenAI-compatible chat/responses and Anthropic-compatible messages; backend token streaming remains runtime-dependent- text inputs are supported for chat/responses in v1
/v1/modelsincludes installed text models only for strict OpenAI-compatible clients such as Xcode/v1/audio/modelsreturns the reusable MLX TTS model catalog with voices, languages, output formats, and capabilities so external agents can present and reuse voice choices/v1/audio/speechgenerates WAV audio and returns the bytes directly so terminal-driven agents can save or forward the file without shared filesystem access/v1/toolsadvertises request-side tool support and/api/tagsprovides an Ollama-compatible model list for local-provider probes- pass
--api-key <token>to requireAuthorization: Bearer <token>
In the interactive TUI (./esh), select OpenAI server to toggle the same local API while the TUI process stays open. In chat, use /serve toggle, /serve start, /serve stop, or /serve status; the header shows whether the local API is on.
Esh can now launch external coding CLIs against local models, similar to Ollama’s tool integrations.
Inspect what is available:
./esh integrations list
./esh integrations show codex
./esh integrations show claude
./esh integrations configure codex --model mlx-community--qwen2.5-0.5b-instruct-4bit
./esh integrations configure claude --model mlx-community--qwen2.5-0.5b-instruct-4bitLaunch Codex CLI against Esh’s OpenAI-compatible local server:
./esh serve --host 127.0.0.1 --port 11435
codex --profile esh-launch
./esh launch codex --model mlx-community--qwen2.5-0.5b-instruct-4bit
./esh launch codex --model mlx-community--qwen2.5-0.5b-instruct-4bit -- exec --ephemeral "Summarize this repository"Launch Claude Code against Esh’s Anthropic-compatible local server:
./esh launch claude --model mlx-community--qwen2.5-0.5b-instruct-4bit
./esh launch claude --model mlx-community--qwen2.5-0.5b-instruct-4bit -- -p "Explain the cache pipeline" --output-format textNotes:
codexis wired through Esh’s localResponsesAPI surfaceclaudeis wired through Esh’s local AnthropicMessagesAPI surface- Codex profiles omit
env_keyby default socodex --profile esh-launchworks without anOPENAI_API_KEY; pass--api-key <token>only when you also run Codex withOPENAI_API_KEY=<token> esh launch claudestarts a local Anthropic-compatible server and injects the matching Claude Code auth env automatically; persistentconfigurewrites Codex/Claude settings for manual launches
Build a self-contained release bundle:
./scripts/package-release.sh
./dist/esh-macos-<version>/share/esh/scripts/smoke-test-package.sh ./dist/esh-macos-<version>Run the packaged tool:
./dist/esh-macos-<version>/esh doctor
./dist/esh-macos-<version>/esh chatThe package smoke test verifies the launcher, packaged runtime paths, recommended model catalog, and empty install store. If the current macOS session cannot expose a Metal GPU to MLX, it reports that condition and continues after the rest of the package checks pass.
Esh includes GitHub Actions workflows for continuous integration and release packaging.
CI workflow:
Release workflow:
What they do:
- CI runs on pushes to
mainand on pull requests - release packaging runs for tags like
v0.1.0 - release packaging can also be started manually from GitHub Actions
- macOS release builds upload the package as an artifact, publish both a notarized
.zipand a.tar.gzplus SHA-256 checksums on the GitHub release, and push the same bundle to GitHub Packages through GHCR - tagged releases can also update the Homebrew tap cask automatically when
HOMEBREW_TAP_TOKENis configured in repo secrets
Esh uses semantic versions stored in:
Helpful commands:
./scripts/release-version.sh show
./scripts/release-version.sh tag
./scripts/release-version.sh verify-tag v0.1.0Suggested release flow:
./scripts/release-version.sh show
git tag "$(./scripts/release-version.sh tag)"
git push origin "$(./scripts/release-version.sh tag)"The GitHub release workflow verifies that the pushed tag matches VERSION.
GitHub surfaces:
Releasesshows downloadable end-user artifacts likeesh-macos-0.1.14.zipPackagesshows the same packaged bundle published to GHCR asghcr.io/fil-technology/esh/esh-macos:<version>
Esh now has a built-in shortlist of recommended stable models for fast first-time setup.
Start there:
./esh model recommended
./esh model recommended --profile chat
./esh model install fast-chatYou can still install directly from a Hugging Face repo id too.
You can search first:
./esh model search qwen
./esh model search qwen --source local
./esh model search qwen --source hf --limit 5Example:
./esh model install mlx-community/Qwen2.5-0.5B-Instruct-4bitCheck a model before downloading it:
./esh model check mlx-community/Qwen2.5-7B-Instruct-4bit --backend mlx
./esh model check bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF --backend gguf --context 8192
./esh model check bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF --backend gguf --variant Q4_K_M
./esh model check mlx-community/gemma-4-27b-it-4bit --jsonThen inspect what is installed:
./esh model list
./esh model list --task audio
./esh model list --capability tts
./esh model inspect mlx-community--qwen2.5-0.5b-instruct-4bitNotes:
- the install command accepts either a Hugging Face repo id or a built-in alias like
fast-chat model checkis heuristic: it estimates likely backend support and likely fit, not a guaranteemodel check --backend autoresolves the backend from repo metadata and filenames when it canmodel checkandmodel installaccept--variant <name>for GGUF quant variants and other explicit repo variants- initial GGUF support is wired through llama.cpp and is currently text-only
- GGUF install/runtime support is intentionally narrow in this pass: it prefers a single clear GGUF candidate and reports ambiguity instead of guessing
- inspect/remove/chat/cache commands accept the installed model id and also the original repo id where practical
- installed ids are normalized like
mlx-community--qwen2.5-0.5b-instruct-4bit
List MLX text-to-speech models exposed through TTSMLX:
./esh audio modelsThe interactive launcher (./esh) also has an Audio entry for generating WAV files through the same MLX TTS path.
Generate a WAV file with an MLX TTS model:
./esh audio speak "Hello from esh" --model pocket-tts --voice alba --out hello.wav
./esh audio speak "Hello from esh" --model Marvis-AI/marvis-tts-250m-v0.2-MLX-8bit --playThe first run downloads the selected TTS model into .esh/tts-models. Speech-to-text is still a planned backend slice; audio transcribe currently reports that STT is not wired yet.
Launch chat:
./esh chatOr just run:
./eshand choose 1. Chat from the default menu.
Launch or reopen a named session:
./esh chat default
./esh chat work
./esh chat experiments
./esh chat work --model mlx-community/Qwen2.5-0.5B-Instruct-4bitInside chat, you can type normal messages and slash commands.
Example:
> hello how are you, what can you do?
> /autosave on
> /sessions
> /new scratch
> /switch default
> /save
> /exit
The chat UI includes:
- transcript pane
- fixed input bar
- fixed footer stats
- command overlay
- saved session switching from inside chat
Useful slash commands:
/menu
/help
/save
/autosave on
/autosave off
/autosave toggle
/new
/new my-session
/switch my-session
/switch <uuid>
/models
/use-model <id-or-repo>
/model current
/sessions
/caches
/search <text>
/doctor
/model inspect <id>
/session show <uuid-or-name>
/cache inspect <uuid>
/close
/exit
List sessions from the CLI:
./esh session listShow a specific saved session:
./esh session show <session-uuid>
./esh session show default
./esh session grep helloThe chat UI shows sessions in a more human-friendly way:
- session name
- short id
- message count
Example:
default [8C56AF77] | 2 messages
lifecycle [D59E570E] | 2 messages
demo-session [2AB2CAF3] | 2 messages
Esh supports:
- raw cache artifacts
- TurboQuant-compressed cache artifacts
- cache inspect
- cache load and resume
List saved cache artifacts:
./esh cache inspectInspect one artifact:
./esh cache inspect C46B9A7C-0636-4111-B300-C5A9AE1341C1Build a cache from a saved session:
./esh session list
./esh cache build --session <session-uuid> --mode raw
./esh cache build --session <session-uuid> --mode turboResume from a saved cache:
./esh cache load --artifact <artifact-uuid> --message "Continue this chat"Important:
- cache artifacts are backend-specific
- cache artifacts are model-specific
- new cache artifacts include a normalized prompt cache key that is backend-, model-, tokenizer-, runtime-, and tool-signature-aware
- Esh reuses one cache pipeline, but artifacts are not portable across runtimes/models
./scripts/bootstrap.sh
./esh model install mlx-community/Qwen2.5-0.5B-Instruct-4bit
./esh chat./esh chat work
./esh chat ideas
./esh chat debuggingOr from inside chat:
/new work
/switch ideas
/sessions
./esh session list
./esh cache build --session <session-uuid-or-name> --mode raw
./esh cache build --session <session-uuid-or-name> --mode turbo
./esh cache inspectCompare raw and TurboQuant cache behavior for the same session:
./esh benchmark --session model-flag-smoke --model mlx-community/Qwen2.5-0.5B-Instruct-4bit --message "Continue with one short sentence about local AI."
./esh benchmark history./esh benchmark --session default --model mlx-community/Qwen2.5-0.5B-Instruct-4bit
./esh benchmark history./esh doctor
./scripts/verify-env.shBy default, Esh stores data under:
~/.esh
This includes separate locations for:
- models
- sessions
- caches
Override the root if needed:
ESH_HOME=/path/to/custom-root ./esh chatEsh also accepts legacy LLMCACHE_* env vars for compatibility during the rename transition.
- Package.swift
- Sources/EshCore
- Sources/esh
- Tools/mlx_vlm_bridge.py
- scripts/bootstrap.sh
- scripts/package-release.sh
- docs/USAGE.md
These are the most important current caveats:
- model search covers installed models and Hugging Face, but install still happens by explicit repo id
- cache artifacts remain runtime/model specific and are not cross-backend portable
- some build runs still show Swift concurrency warnings from
ProcessRunner.swift, but the tool functions correctly
See the full guide at docs/USAGE.md.