Skip to content

trvon/yams

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

996 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

YAMS — Yet Another Memory System

Persistent memory for LLMs and apps. Content-addressed storage with dedupe, compression, full-text and vector search.

license language github builds last commit

Warning

Experimental — not production ready. Expect bugs and breaking changes until 1.0.

Features

  • SHA-256 content-addressed storage with block-level dedupe (Rabin chunking)
  • Full-text search (SQLite FTS5) + semantic vector search (embeddings)
  • Tree-sitter symbol extraction for 18 languages (list)
  • Snapshot management with Merkle tree diffs and rename detection
  • WAL-backed durability, high-throughput I/O, thread-safe
  • CLI, MCP server, and C-ABI plugins (ONNX/GLiNER/ColBERT, S3 storage, PDF via ZYP)
  • Interactive relevance tuning through CLI tuning and doctor workflows

Documentation

Topic Link
Install docs/user_guide/installation.md
CLI reference docs/user_guide/cli.md
MCP server docs/user_guide/mcp.md
Embeddings docs/user_guide/embeddings.md
Plugins docs/PLUGINS.md
Build from source docs/BUILD.md
Architecture docs/architecture/
Benchmarks docs/benchmarks/README.md
Changelog docs/changelogs/
Roadmap docs/roadmap.md

Links

Install

Supported: Linux x86_64/ARM64, macOS x86_64/ARM64, Windows x86_64.

# macOS
brew install trvon/yams/yams

# Docker
docker pull ghcr.io/trvon/yams:latest

# Debian/Ubuntu, Fedora/RHEL, Windows: see installation guide

Full install matrix and package repos: docs/user_guide/installation.md.

Build from source

./setup.sh Release          # Linux/macOS (auto-detects toolchain, runs Conan + Meson)
meson compile -C build/release
./setup.ps1 Release         # Windows
meson compile -C build/release

Requires a C++20 toolchain (GCC 13+, Clang 16+, or MSVC 2022+ recommended), Meson, Ninja, CMake, pkg-config, and Conan. See docs/BUILD.md.

Quick Start

yams init                                # interactive; use --auto for headless
yams add ./README.md --tags docs
yams add src/ --recursive --include="*.cpp,*.h" --tags code

yams search "config file" --limit 5
yams grep "TODO" --include="*.cpp"

yams list --limit 20
yams get <hash> -o ./output.bin

Shell completions: yams completion bash|zsh|fish|powershell. Install instructions: docs/user_guide/cli.md#cmd-completion.

MCP Server

YAMS ships an MCP server over stdio (JSON-RPC) for AI assistants.

yams serve
{
  "mcpServers": {
    "yams": { "command": "yams", "args": ["serve"] }
  }
}

Tool reference and MCP client setup: docs/user_guide/mcp.md.

Plugins

yams plugin list                                  # loaded plugins
yams plugin trust add ~/.local/lib/yams/plugins   # trust a directory
yams plugin health                                # status
yams doctor plugin onnx                           # diagnose

Plugin architecture, trust model, and bundled plugins (ONNX, S3, ZYP, GLiNER, symbol extractor): docs/PLUGINS.md.

GPU acceleration (ONNX)

Platform Provider Hardware
macOS CoreML Apple Silicon Neural Engine + GPU
Linux CUDA NVIDIA GPUs
Linux MIGraphX AMD GPUs (ROCm)
Windows DirectML Any DirectX 12 GPU (NVIDIA, AMD, Intel)

Auto-detected at build. Override with YAMS_ONNX_GPU=auto|cuda|coreml|directml|migraphx|none. Details: plugins/onnx/README.md.

Default retrieval backend: simeon (training-free, model-free, 1024-d on new installs).

Simeon embedding backend

YAMS now uses the training-free simeon backend by default for retrieval embeddings. No embedding model download is required for the normal semantic-search path.

Enable it via config:

[embeddings]
backend = "simeon"
preferred_model = "simeon-default"
embedding_dim = 1024  # existing vector DBs keep their stored dim

Simeon's output dimension is tunable — training-free, so higher dims have no training cost, only storage and compute cost:

embedding_dim Use when
384 Tight-budget fallback; smallest index, fastest.
768 More separability, fewer hash collisions; 2× storage.
1024 Default for new installs; balanced quality / size.
1536+ Experimental headroom; diminishing returns past ~2k.

Set via config (embedding_dim). Internal ngram/sketch/projection knobs: third_party/simeon/README.md.

Requirements: no model file, no ONNX embedding runtime, no external runtime deps. When building from source, fetch the submodule: git submodule update --init --recursive.

ONNX remains relevant for optional plugin paths such as GLiNER and ColBERT, not for the default retrieval embedding flow.

Benchmark summaries are maintained under docs/benchmarks/.

Troubleshooting

yams doctor              # full diagnostics
yams stats --verbose     # storage statistics
yams repair --all        # repair common issues

Build issues: docs/BUILD.md. Empty yams plugin list? Add a trust path: yams plugin trust add ~/.local/lib/yams/plugins.

Cite

@misc{yams,
  author = {Trevon Williams},
  title = {YAMS: Content-addressable storage with semantic search},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/trvon/yams}
}

About

Persistent memory for LLMs and apps. Content-addressed storage with dedupe, compression, full-text and vector search.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors