Esh

Esh is a local-first LLM tool for Apple Silicon.

It gives you:

local model install and management
passive engine detection for MLX and llama.cpp
interactive terminal chat
stable JSON commands for external callers
saved sessions
backend-native execution cache export/import
TurboQuant cache compression for MLX
self-contained release packaging

Today, Esh is a macOS-focused local model orchestrator with MLX and GGUF/llama.cpp backends. It manages, validates, selects, and routes existing runtimes rather than implementing model kernels itself.

Planning Notes

Durable engineering notes live in:

docs/PLANNING.md

What Esh Is For

Esh is designed for people who want a local chat tool that is:

fast to run from terminal
practical for repeated conversations
honest about model and cache compatibility
ready to grow into more backends later

This is a text-chat tool in v1.

It does not yet do:

document ingestion
codebase indexing
embeddings or RAG
multimodal chat
in-tool model install directly from arbitrary search results outside local installs and Hugging Face

Quick Start

Install and run

For end users on macOS, the one-line install and run command is:

brew tap fil-technology/tap && brew install --cask esh && esh

If brew is not installed yet, install Homebrew first:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then install and run Esh:

brew tap fil-technology/tap && brew install --cask esh && esh

If you prefer the steps split out:

brew tap fil-technology/tap
brew install --cask esh
esh

Upgrade later with:

brew upgrade --cask esh

If you previously tried the older formula-based install, remove it first:

brew uninstall esh
brew install --cask esh

Developer mode

Bootstrap once:

./scripts/bootstrap.sh

Then use the stable launcher:

./esh
./esh doctor
./esh engines list
./esh model list
./esh chat

Running ./esh with no command opens a default interactive launcher menu with the most common actions.

Runtime Orchestration

Esh reads ~/.esh/config.toml when present. Create or inspect it with:

./esh config init
./esh config show
./esh config path

Inspect required and optional engines:

./esh doctor
./esh engines list
./esh engines doctor llama.cpp
./esh engines doctor mlx

llama-cli is detected passively from ESH_LLAMA_CPP_CLI, LLAMA_CPP_CLI, Homebrew paths, or PATH; Esh does not install llama.cpp automatically. Validate local model files before routing them:

./esh validate /path/to/model.gguf --engine llama.cpp
./esh validate /path/to/mlx-model --engine mlx --json

External callers

Use esh capabilities to get a JSON map of supported backends, installed models, and whether each path supports direct inference, cache build, and cache load. Internally, backends also expose capability reports for runtime readiness and feature support; MLX currently reports direct inference, token streaming, and prompt cache build/load, while llama.cpp reports direct inference and token streaming with GGUF cache features marked unavailable.

Use esh infer for machine-friendly inference. It returns JSON for both MLX and GGUF models, and MLX cache load stays optional rather than being the only supported integration path.

./esh capabilities
cat <<'JSON' | ./esh infer --input -
{
  "schemaVersion": "esh.infer.request.v1",
  "model": "mlx-community--qwen2.5-0.5b-instruct-4bit",
  "messages": [
    { "role": "user", "text": "Say hello in one sentence." }
  ],
  "generation": {
    "maxTokens": 64,
    "temperature": 0.2
  }
}
JSON

Use esh serve to expose a local OpenAI-compatible HTTP surface for editors, scripts, and desktop apps.

./esh serve --host 127.0.0.1 --port 11435
curl http://127.0.0.1:11435/v1/models
curl http://127.0.0.1:11435/v1/audio/models
curl http://127.0.0.1:11435/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "pocket-tts",
    "input": "Hello from esh",
    "voice": "alba"
  }' \
  --output hello.wav
curl http://127.0.0.1:11435/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "mlx-community--qwen2.5-0.5b-instruct-4bit",
    "messages": [
      { "role": "user", "content": "Say hello in one sentence." }
    ]
  }'

Supported routes in v1:

GET /health
GET /v1/models
GET /v1/tools
GET /v1/audio/models
GET /api/tags
POST /v1/audio/speech
POST /v1/chat/completions
POST /v1/responses

Notes:

unsupported request fields are ignored when safe
stream: true is supported for OpenAI-compatible chat/responses and Anthropic-compatible messages; backend token streaming remains runtime-dependent
text inputs are supported for chat/responses in v1
/v1/models includes installed text models only for strict OpenAI-compatible clients such as Xcode
/v1/audio/models returns the reusable MLX TTS model catalog with voices, languages, output formats, and capabilities so external agents can present and reuse voice choices
/v1/audio/speech generates WAV audio and returns the bytes directly so terminal-driven agents can save or forward the file without shared filesystem access
/v1/tools advertises request-side tool support and /api/tags provides an Ollama-compatible model list for local-provider probes
pass --api-key <token> to require Authorization: Bearer <token>

In the interactive TUI (./esh), select OpenAI server to toggle the same local API while the TUI process stays open. In chat, use /serve toggle, /serve start, /serve stop, or /serve status; the header shows whether the local API is on.

Launch external coding agents

Esh can now launch external coding CLIs against local models, similar to Ollama’s tool integrations.

Inspect what is available:

./esh integrations list
./esh integrations show codex
./esh integrations show claude
./esh integrations configure codex --model mlx-community--qwen2.5-0.5b-instruct-4bit
./esh integrations configure claude --model mlx-community--qwen2.5-0.5b-instruct-4bit

Launch Codex CLI against Esh’s OpenAI-compatible local server:

./esh serve --host 127.0.0.1 --port 11435
codex --profile esh-launch
./esh launch codex --model mlx-community--qwen2.5-0.5b-instruct-4bit
./esh launch codex --model mlx-community--qwen2.5-0.5b-instruct-4bit -- exec --ephemeral "Summarize this repository"

Launch Claude Code against Esh’s Anthropic-compatible local server:

./esh launch claude --model mlx-community--qwen2.5-0.5b-instruct-4bit
./esh launch claude --model mlx-community--qwen2.5-0.5b-instruct-4bit -- -p "Explain the cache pipeline" --output-format text

Notes:

codex is wired through Esh’s local Responses API surface
claude is wired through Esh’s local Anthropic Messages API surface
Codex profiles omit env_key by default so codex --profile esh-launch works without an OPENAI_API_KEY; pass --api-key <token> only when you also run Codex with OPENAI_API_KEY=<token>
esh launch claude starts a local Anthropic-compatible server and injects the matching Claude Code auth env automatically; persistent configure writes Codex/Claude settings for manual launches

Release mode

Build a self-contained release bundle:

./scripts/package-release.sh
./dist/esh-macos-<version>/share/esh/scripts/smoke-test-package.sh ./dist/esh-macos-<version>

Run the packaged tool:

./dist/esh-macos-<version>/esh doctor
./dist/esh-macos-<version>/esh chat

The package smoke test verifies the launcher, packaged runtime paths, recommended model catalog, and empty install store. If the current macOS session cannot expose a Metal GPU to MLX, it reports that condition and continues after the rest of the package checks pass.

GitHub CI/CD

Esh includes GitHub Actions workflows for continuous integration and release packaging.

CI workflow:

ci.yml

Release workflow:

release.yml

What they do:

CI runs on pushes to main and on pull requests
release packaging runs for tags like v0.1.0
release packaging can also be started manually from GitHub Actions
macOS release builds upload the package as an artifact, publish both a notarized .zip and a .tar.gz plus SHA-256 checksums on the GitHub release, and push the same bundle to GitHub Packages through GHCR
tagged releases can also update the Homebrew tap cask automatically when HOMEBREW_TAP_TOKEN is configured in repo secrets

Versioning and Releases

Esh uses semantic versions stored in:

Helpful commands:

./scripts/release-version.sh show
./scripts/release-version.sh tag
./scripts/release-version.sh verify-tag v0.1.0

Suggested release flow:

./scripts/release-version.sh show
git tag "$(./scripts/release-version.sh tag)"
git push origin "$(./scripts/release-version.sh tag)"

The GitHub release workflow verifies that the pushed tag matches VERSION.

GitHub surfaces:

Releases shows downloadable end-user artifacts like esh-macos-0.1.14.zip
Packages shows the same packaged bundle published to GHCR as ghcr.io/fil-technology/esh/esh-macos:<version>

Install a Model

Esh now has a built-in shortlist of recommended stable models for fast first-time setup.

Start there:

./esh model recommended
./esh model recommended --profile chat
./esh model install fast-chat

You can still install directly from a Hugging Face repo id too.

You can search first:

./esh model search qwen
./esh model search qwen --source local
./esh model search qwen --source hf --limit 5

Example:

./esh model install mlx-community/Qwen2.5-0.5B-Instruct-4bit

Check a model before downloading it:

./esh model check mlx-community/Qwen2.5-7B-Instruct-4bit --backend mlx
./esh model check bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF --backend gguf --context 8192
./esh model check bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF --backend gguf --variant Q4_K_M
./esh model check mlx-community/gemma-4-27b-it-4bit --json

Then inspect what is installed:

./esh model list
./esh model list --task audio
./esh model list --capability tts
./esh model inspect mlx-community--qwen2.5-0.5b-instruct-4bit

Notes:

the install command accepts either a Hugging Face repo id or a built-in alias like fast-chat
model check is heuristic: it estimates likely backend support and likely fit, not a guarantee
model check --backend auto resolves the backend from repo metadata and filenames when it can
model check and model install accept --variant <name> for GGUF quant variants and other explicit repo variants
initial GGUF support is wired through llama.cpp and is currently text-only
GGUF install/runtime support is intentionally narrow in this pass: it prefers a single clear GGUF candidate and reports ambiguity instead of guessing
inspect/remove/chat/cache commands accept the installed model id and also the original repo id where practical
installed ids are normalized like mlx-community--qwen2.5-0.5b-instruct-4bit

Audio

List MLX text-to-speech models exposed through TTSMLX:

./esh audio models

The interactive launcher (./esh) also has an Audio entry for generating WAV files through the same MLX TTS path.

Generate a WAV file with an MLX TTS model:

./esh audio speak "Hello from esh" --model pocket-tts --voice alba --out hello.wav
./esh audio speak "Hello from esh" --model Marvis-AI/marvis-tts-250m-v0.2-MLX-8bit --play

The first run downloads the selected TTS model into .esh/tts-models. Speech-to-text is still a planned backend slice; audio transcribe currently reports that STT is not wired yet.

Chat

Launch chat:

./esh chat

Or just run:

./esh

and choose 1. Chat from the default menu.

Launch or reopen a named session:

./esh chat default
./esh chat work
./esh chat experiments
./esh chat work --model mlx-community/Qwen2.5-0.5B-Instruct-4bit

Inside chat, you can type normal messages and slash commands.

Example:

> hello how are you, what can you do?
> /autosave on
> /sessions
> /new scratch
> /switch default
> /save
> /exit

TUI Features

The chat UI includes:

transcript pane
fixed input bar
fixed footer stats
command overlay
saved session switching from inside chat

Useful slash commands:

/menu
/help
/save
/autosave on
/autosave off
/autosave toggle
/new
/new my-session
/switch my-session
/switch <uuid>
/models
/use-model <id-or-repo>
/model current
/sessions
/caches
/search <text>
/doctor
/model inspect <id>
/session show <uuid-or-name>
/cache inspect <uuid>
/close
/exit

Sessions

List sessions from the CLI:

./esh session list

Show a specific saved session:

./esh session show <session-uuid>
./esh session show default
./esh session grep hello

The chat UI shows sessions in a more human-friendly way:

session name
short id
message count

Example:

default [8C56AF77] | 2 messages
lifecycle [D59E570E] | 2 messages
demo-session [2AB2CAF3] | 2 messages

Cache Workflows

Esh supports:

raw cache artifacts
TurboQuant-compressed cache artifacts
cache inspect
cache load and resume

List saved cache artifacts:

./esh cache inspect

Inspect one artifact:

./esh cache inspect C46B9A7C-0636-4111-B300-C5A9AE1341C1

Build a cache from a saved session:

./esh session list
./esh cache build --session <session-uuid> --mode raw
./esh cache build --session <session-uuid> --mode turbo

Resume from a saved cache:

./esh cache load --artifact <artifact-uuid> --message "Continue this chat"

Important:

cache artifacts are backend-specific
cache artifacts are model-specific
new cache artifacts include a normalized prompt cache key that is backend-, model-, tokenizer-, runtime-, and tool-signature-aware
Esh reuses one cache pipeline, but artifacts are not portable across runtimes/models

Typical Use Cases

1. Quick local chat

./scripts/bootstrap.sh
./esh model install mlx-community/Qwen2.5-0.5B-Instruct-4bit
./esh chat

2. Keep multiple named chats

./esh chat work
./esh chat ideas
./esh chat debugging

Or from inside chat:

/new work
/switch ideas
/sessions

3. Save a conversation state and benchmark cache modes

./esh session list
./esh cache build --session <session-uuid-or-name> --mode raw
./esh cache build --session <session-uuid-or-name> --mode turbo
./esh cache inspect

Benchmarking

Compare raw and TurboQuant cache behavior for the same session:

./esh benchmark --session model-flag-smoke --model mlx-community/Qwen2.5-0.5B-Instruct-4bit --message "Continue with one short sentence about local AI."
./esh benchmark history

4. Compare raw vs turbo on a real saved session

./esh benchmark --session default --model mlx-community/Qwen2.5-0.5B-Instruct-4bit
./esh benchmark history

5. Verify environment health before debugging

./esh doctor
./scripts/verify-env.sh

Data Layout

By default, Esh stores data under:

~/.esh

This includes separate locations for:

models
sessions
caches

Override the root if needed:

ESH_HOME=/path/to/custom-root ./esh chat

Esh also accepts legacy LLMCACHE_* env vars for compatibility during the rename transition.

Project Layout

Current Limitations

These are the most important current caveats:

model search covers installed models and Hugging Face, but install still happens by explicit repo id
cache artifacts remain runtime/model specific and are not cross-backend portable
some build runs still show Swift concurrency warnings from ProcessRunner.swift, but the tool functions correctly

More Detailed Guide

See the full guide at docs/USAGE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github/workflows		.github/workflows
Sources		Sources
Tests		Tests
Tools		Tools
docs		docs
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
VERSION		VERSION
esh		esh

Folders and files

Latest commit

History

Repository files navigation

Esh

Planning Notes

What Esh Is For

Quick Start

Install and run

Developer mode

Runtime Orchestration

External callers

Launch external coding agents

Release mode

GitHub CI/CD

Versioning and Releases

Install a Model

Audio

Chat

TUI Features

Sessions

Cache Workflows

Typical Use Cases

1. Quick local chat

2. Keep multiple named chats

3. Save a conversation state and benchmark cache modes

Benchmarking

4. Compare raw vs turbo on a real saved session

5. Verify environment health before debugging

Data Layout

Project Layout

Current Limitations

More Detailed Guide

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 37

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages