Skip to content

fil-technology/esh

Repository files navigation

Esh

Esh is a local-first LLM tool for Apple Silicon.

It gives you:

  • local model install and management
  • passive engine detection for MLX and llama.cpp
  • interactive terminal chat
  • stable JSON commands for external callers
  • saved sessions
  • backend-native execution cache export/import
  • TurboQuant cache compression for MLX
  • self-contained release packaging

Today, Esh is a macOS-focused local model orchestrator with MLX and GGUF/llama.cpp backends. It manages, validates, selects, and routes existing runtimes rather than implementing model kernels itself.

Planning Notes

Durable engineering notes live in:

What Esh Is For

Esh is designed for people who want a local chat tool that is:

  • fast to run from terminal
  • practical for repeated conversations
  • honest about model and cache compatibility
  • ready to grow into more backends later

This is a text-chat tool in v1.

It does not yet do:

  • document ingestion
  • codebase indexing
  • embeddings or RAG
  • multimodal chat
  • in-tool model install directly from arbitrary search results outside local installs and Hugging Face

Quick Start

Install and run

For end users on macOS, the one-line install and run command is:

brew tap fil-technology/tap && brew install --cask esh && esh

If brew is not installed yet, install Homebrew first:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then install and run Esh:

brew tap fil-technology/tap && brew install --cask esh && esh

If you prefer the steps split out:

brew tap fil-technology/tap
brew install --cask esh
esh

Upgrade later with:

brew upgrade --cask esh

If you previously tried the older formula-based install, remove it first:

brew uninstall esh
brew install --cask esh

Developer mode

Bootstrap once:

./scripts/bootstrap.sh

Then use the stable launcher:

./esh
./esh doctor
./esh engines list
./esh model list
./esh chat

Running ./esh with no command opens a default interactive launcher menu with the most common actions.

Runtime Orchestration

Esh reads ~/.esh/config.toml when present. Create or inspect it with:

./esh config init
./esh config show
./esh config path

Inspect required and optional engines:

./esh doctor
./esh engines list
./esh engines doctor llama.cpp
./esh engines doctor mlx

llama-cli is detected passively from ESH_LLAMA_CPP_CLI, LLAMA_CPP_CLI, Homebrew paths, or PATH; Esh does not install llama.cpp automatically. Validate local model files before routing them:

./esh validate /path/to/model.gguf --engine llama.cpp
./esh validate /path/to/mlx-model --engine mlx --json

External callers

Use esh capabilities to get a JSON map of supported backends, installed models, and whether each path supports direct inference, cache build, and cache load. Internally, backends also expose capability reports for runtime readiness and feature support; MLX currently reports direct inference, token streaming, and prompt cache build/load, while llama.cpp reports direct inference and token streaming with GGUF cache features marked unavailable.

Use esh infer for machine-friendly inference. It returns JSON for both MLX and GGUF models, and MLX cache load stays optional rather than being the only supported integration path.

./esh capabilities
cat <<'JSON' | ./esh infer --input -
{
  "schemaVersion": "esh.infer.request.v1",
  "model": "mlx-community--qwen2.5-0.5b-instruct-4bit",
  "messages": [
    { "role": "user", "text": "Say hello in one sentence." }
  ],
  "generation": {
    "maxTokens": 64,
    "temperature": 0.2
  }
}
JSON

Use esh serve to expose a local OpenAI-compatible HTTP surface for editors, scripts, and desktop apps.

./esh serve --host 127.0.0.1 --port 11435
curl http://127.0.0.1:11435/v1/models
curl http://127.0.0.1:11435/v1/audio/models
curl http://127.0.0.1:11435/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "pocket-tts",
    "input": "Hello from esh",
    "voice": "alba"
  }' \
  --output hello.wav
curl http://127.0.0.1:11435/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "mlx-community--qwen2.5-0.5b-instruct-4bit",
    "messages": [
      { "role": "user", "content": "Say hello in one sentence." }
    ]
  }'

Supported routes in v1:

  • GET /health
  • GET /v1/models
  • GET /v1/tools
  • GET /v1/audio/models
  • GET /api/tags
  • POST /v1/audio/speech
  • POST /v1/chat/completions
  • POST /v1/responses

Notes:

  • unsupported request fields are ignored when safe
  • stream: true is supported for OpenAI-compatible chat/responses and Anthropic-compatible messages; backend token streaming remains runtime-dependent
  • text inputs are supported for chat/responses in v1
  • /v1/models includes installed text models only for strict OpenAI-compatible clients such as Xcode
  • /v1/audio/models returns the reusable MLX TTS model catalog with voices, languages, output formats, and capabilities so external agents can present and reuse voice choices
  • /v1/audio/speech generates WAV audio and returns the bytes directly so terminal-driven agents can save or forward the file without shared filesystem access
  • /v1/tools advertises request-side tool support and /api/tags provides an Ollama-compatible model list for local-provider probes
  • pass --api-key <token> to require Authorization: Bearer <token>

In the interactive TUI (./esh), select OpenAI server to toggle the same local API while the TUI process stays open. In chat, use /serve toggle, /serve start, /serve stop, or /serve status; the header shows whether the local API is on.

Launch external coding agents

Esh can now launch external coding CLIs against local models, similar to Ollama’s tool integrations.

Inspect what is available:

./esh integrations list
./esh integrations show codex
./esh integrations show claude
./esh integrations configure codex --model mlx-community--qwen2.5-0.5b-instruct-4bit
./esh integrations configure claude --model mlx-community--qwen2.5-0.5b-instruct-4bit

Launch Codex CLI against Esh’s OpenAI-compatible local server:

./esh serve --host 127.0.0.1 --port 11435
codex --profile esh-launch
./esh launch codex --model mlx-community--qwen2.5-0.5b-instruct-4bit
./esh launch codex --model mlx-community--qwen2.5-0.5b-instruct-4bit -- exec --ephemeral "Summarize this repository"

Launch Claude Code against Esh’s Anthropic-compatible local server:

./esh launch claude --model mlx-community--qwen2.5-0.5b-instruct-4bit
./esh launch claude --model mlx-community--qwen2.5-0.5b-instruct-4bit -- -p "Explain the cache pipeline" --output-format text

Notes:

  • codex is wired through Esh’s local Responses API surface
  • claude is wired through Esh’s local Anthropic Messages API surface
  • Codex profiles omit env_key by default so codex --profile esh-launch works without an OPENAI_API_KEY; pass --api-key <token> only when you also run Codex with OPENAI_API_KEY=<token>
  • esh launch claude starts a local Anthropic-compatible server and injects the matching Claude Code auth env automatically; persistent configure writes Codex/Claude settings for manual launches

Release mode

Build a self-contained release bundle:

./scripts/package-release.sh
./dist/esh-macos-<version>/share/esh/scripts/smoke-test-package.sh ./dist/esh-macos-<version>

Run the packaged tool:

./dist/esh-macos-<version>/esh doctor
./dist/esh-macos-<version>/esh chat

The package smoke test verifies the launcher, packaged runtime paths, recommended model catalog, and empty install store. If the current macOS session cannot expose a Metal GPU to MLX, it reports that condition and continues after the rest of the package checks pass.

GitHub CI/CD

Esh includes GitHub Actions workflows for continuous integration and release packaging.

CI workflow:

Release workflow:

What they do:

  • CI runs on pushes to main and on pull requests
  • release packaging runs for tags like v0.1.0
  • release packaging can also be started manually from GitHub Actions
  • macOS release builds upload the package as an artifact, publish both a notarized .zip and a .tar.gz plus SHA-256 checksums on the GitHub release, and push the same bundle to GitHub Packages through GHCR
  • tagged releases can also update the Homebrew tap cask automatically when HOMEBREW_TAP_TOKEN is configured in repo secrets

Versioning and Releases

Esh uses semantic versions stored in:

Helpful commands:

./scripts/release-version.sh show
./scripts/release-version.sh tag
./scripts/release-version.sh verify-tag v0.1.0

Suggested release flow:

./scripts/release-version.sh show
git tag "$(./scripts/release-version.sh tag)"
git push origin "$(./scripts/release-version.sh tag)"

The GitHub release workflow verifies that the pushed tag matches VERSION.

GitHub surfaces:

  • Releases shows downloadable end-user artifacts like esh-macos-0.1.14.zip
  • Packages shows the same packaged bundle published to GHCR as ghcr.io/fil-technology/esh/esh-macos:<version>

Install a Model

Esh now has a built-in shortlist of recommended stable models for fast first-time setup.

Start there:

./esh model recommended
./esh model recommended --profile chat
./esh model install fast-chat

You can still install directly from a Hugging Face repo id too.

You can search first:

./esh model search qwen
./esh model search qwen --source local
./esh model search qwen --source hf --limit 5

Example:

./esh model install mlx-community/Qwen2.5-0.5B-Instruct-4bit

Check a model before downloading it:

./esh model check mlx-community/Qwen2.5-7B-Instruct-4bit --backend mlx
./esh model check bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF --backend gguf --context 8192
./esh model check bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF --backend gguf --variant Q4_K_M
./esh model check mlx-community/gemma-4-27b-it-4bit --json

Then inspect what is installed:

./esh model list
./esh model list --task audio
./esh model list --capability tts
./esh model inspect mlx-community--qwen2.5-0.5b-instruct-4bit

Notes:

  • the install command accepts either a Hugging Face repo id or a built-in alias like fast-chat
  • model check is heuristic: it estimates likely backend support and likely fit, not a guarantee
  • model check --backend auto resolves the backend from repo metadata and filenames when it can
  • model check and model install accept --variant <name> for GGUF quant variants and other explicit repo variants
  • initial GGUF support is wired through llama.cpp and is currently text-only
  • GGUF install/runtime support is intentionally narrow in this pass: it prefers a single clear GGUF candidate and reports ambiguity instead of guessing
  • inspect/remove/chat/cache commands accept the installed model id and also the original repo id where practical
  • installed ids are normalized like mlx-community--qwen2.5-0.5b-instruct-4bit

Audio

List MLX text-to-speech models exposed through TTSMLX:

./esh audio models

The interactive launcher (./esh) also has an Audio entry for generating WAV files through the same MLX TTS path.

Generate a WAV file with an MLX TTS model:

./esh audio speak "Hello from esh" --model pocket-tts --voice alba --out hello.wav
./esh audio speak "Hello from esh" --model Marvis-AI/marvis-tts-250m-v0.2-MLX-8bit --play

The first run downloads the selected TTS model into .esh/tts-models. Speech-to-text is still a planned backend slice; audio transcribe currently reports that STT is not wired yet.

Chat

Launch chat:

./esh chat

Or just run:

./esh

and choose 1. Chat from the default menu.

Launch or reopen a named session:

./esh chat default
./esh chat work
./esh chat experiments
./esh chat work --model mlx-community/Qwen2.5-0.5B-Instruct-4bit

Inside chat, you can type normal messages and slash commands.

Example:

> hello how are you, what can you do?
> /autosave on
> /sessions
> /new scratch
> /switch default
> /save
> /exit

TUI Features

The chat UI includes:

  • transcript pane
  • fixed input bar
  • fixed footer stats
  • command overlay
  • saved session switching from inside chat

Useful slash commands:

/menu
/help
/save
/autosave on
/autosave off
/autosave toggle
/new
/new my-session
/switch my-session
/switch <uuid>
/models
/use-model <id-or-repo>
/model current
/sessions
/caches
/search <text>
/doctor
/model inspect <id>
/session show <uuid-or-name>
/cache inspect <uuid>
/close
/exit

Sessions

List sessions from the CLI:

./esh session list

Show a specific saved session:

./esh session show <session-uuid>
./esh session show default
./esh session grep hello

The chat UI shows sessions in a more human-friendly way:

  • session name
  • short id
  • message count

Example:

default [8C56AF77] | 2 messages
lifecycle [D59E570E] | 2 messages
demo-session [2AB2CAF3] | 2 messages

Cache Workflows

Esh supports:

  • raw cache artifacts
  • TurboQuant-compressed cache artifacts
  • cache inspect
  • cache load and resume

List saved cache artifacts:

./esh cache inspect

Inspect one artifact:

./esh cache inspect C46B9A7C-0636-4111-B300-C5A9AE1341C1

Build a cache from a saved session:

./esh session list
./esh cache build --session <session-uuid> --mode raw
./esh cache build --session <session-uuid> --mode turbo

Resume from a saved cache:

./esh cache load --artifact <artifact-uuid> --message "Continue this chat"

Important:

  • cache artifacts are backend-specific
  • cache artifacts are model-specific
  • new cache artifacts include a normalized prompt cache key that is backend-, model-, tokenizer-, runtime-, and tool-signature-aware
  • Esh reuses one cache pipeline, but artifacts are not portable across runtimes/models

Typical Use Cases

1. Quick local chat

./scripts/bootstrap.sh
./esh model install mlx-community/Qwen2.5-0.5B-Instruct-4bit
./esh chat

2. Keep multiple named chats

./esh chat work
./esh chat ideas
./esh chat debugging

Or from inside chat:

/new work
/switch ideas
/sessions

3. Save a conversation state and benchmark cache modes

./esh session list
./esh cache build --session <session-uuid-or-name> --mode raw
./esh cache build --session <session-uuid-or-name> --mode turbo
./esh cache inspect

Benchmarking

Compare raw and TurboQuant cache behavior for the same session:

./esh benchmark --session model-flag-smoke --model mlx-community/Qwen2.5-0.5B-Instruct-4bit --message "Continue with one short sentence about local AI."
./esh benchmark history

4. Compare raw vs turbo on a real saved session

./esh benchmark --session default --model mlx-community/Qwen2.5-0.5B-Instruct-4bit
./esh benchmark history

5. Verify environment health before debugging

./esh doctor
./scripts/verify-env.sh

Data Layout

By default, Esh stores data under:

~/.esh

This includes separate locations for:

  • models
  • sessions
  • caches

Override the root if needed:

ESH_HOME=/path/to/custom-root ./esh chat

Esh also accepts legacy LLMCACHE_* env vars for compatibility during the rename transition.

Project Layout

Current Limitations

These are the most important current caveats:

  • model search covers installed models and Hugging Face, but install still happens by explicit repo id
  • cache artifacts remain runtime/model specific and are not cross-backend portable
  • some build runs still show Swift concurrency warnings from ProcessRunner.swift, but the tool functions correctly

More Detailed Guide

See the full guide at docs/USAGE.md.

About

Local-first LLM chat and cache tooling for Apple Silicon

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors