A semantic filesystem that turns passive file storage into an intelligent, searchable knowledge base β built for the AI era.
Documentation Β· Quickstart Β· Architecture Β· MCP for agents Β· Roadmap
StrataFS watches your directories β local or cloud β parses files into semantic chunks, generates vector embeddings, and exposes everything through a hybrid search engine that combines full-text and semantic similarity. It speaks the Model Context Protocol, so any MCP-aware agent can use your filesystem as a structured knowledge resource. No SaaS. No lock-in. Read-only by design.
# 30 seconds to your first semantic search:
npm install -g stratafs && stratafs config init && stratafs serve &
stratafs search "where do we handle JWT refresh?"|
npm npm install -g stratafs |
PyPI pip install stratafs |
Homebrew brew tap neul-labs/stratafs
brew install stratafs |
|
macOS / Linux curl -fsSL https://raw.githubusercontent.com/neul-labs/stratafs/main/scripts/install.sh | bash |
Docker docker run -d -p 8080:8080 -p 8081:8081 \
ghcr.io/neul-labs/stratafs:latest |
From source git clone https://github.com/neul-labs/stratafs.git
cd stratafs && make build |
Then:
stratafs config init # writes ~/.stratafs/config.json
stratafs serve # REST on :8080, MCP on :8081
stratafs search "any natural language query"Native installers (NSIS for Windows, signed .pkg for macOS, .deb / AppImage for Linux) are on the releases page.
$ stratafs search "rate limit middleware"
pkg/api/middleware/ratelimit.go β
0.94
ββββββββββββββββββββββββββββββββββββββββ
func RateLimit(rps int) gin.HandlerFunc {
bucket := tokenbucket.New(rps, rps*2)
return func(c *gin.Context) {
if !bucket.Take(1) { c.AbortWithStatus(429) ...
docs/api/rate-limiting.md β
0.88
ββββββββββββββββββββββββββββββββββββββββ
Per-IP rate limits default to 100 requests/min...
internal/gateway/policy.yaml β
0.71
ββββββββββββββββββββββββββββββββββββββββ
policies:
- name: api-default
rate: 100/m
burst: 200The same query over the REST API:
curl "http://localhost:8080/search?q=rate+limit+middleware&limit=5" | jqOr from an MCP-aware agent β no glue code required:
{
"mcpServers": {
"stratafs": { "command": "stratafs", "args": ["serve", "--mcp-only"] }
}
}|
Stop
|
A clean, layered design built around three first-class invariants: read-only sources, per-source isolation, and hybrid scoring in a single SQL query.
|
Every layer is an extension point. Add a parser, a backend, a chunker, or a ranking signal in a single Go file.
|
# 1. Install
pip install stratafs
# 2. Initialize
stratafs config init
# 3. Add a source (edit ~/.stratafs/config.json)
# {"id":"docs","type":"local","path":"/path/to/anything","enabled":true}
# 4. Start the daemon
stratafs serve &
# 5. Search β CLI, REST, or MCP
stratafs search "the thing I half-remember writing"
curl "http://localhost:8080/search?q=onboarding+flow"The first scan runs at 50β100 files/sec. Searches return in under 100 ms once the index is warm. Everything lives under ~/.stratafs/ β one directory, one filesystem, one source of truth.
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β REST API β β MCP Server β β CLI / UI β
β :8080 β β :8081 β β β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
βββββββββββββββββββΌββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
β Hybrid Search β
β FTS5 + Vector β
β (single SQL CTE) β
ββββββββββββ¬βββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β β β
βββββββΌββββββ βββββββΌββββββ βββββββΌββββββ
β SQLite + β β FastEmbed β β Job Queue β
β sqlite-vecβ β + ONNX β β (SQLite) β
βββββββββββββ βββββββββββββ βββββββ¬ββββββ
β
βββββββββββΌββββββββββ
β Monitor (local + β
β remote scanner) β
βββββββββββ¬ββββββββββ
β
βββββββββββββΌββββββββββββ
β Storage Factory β
βββββββββββββ¬ββββββββββββ
β
βββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββ
β β β
ββββββββΌβββββββ βββββββββΌββββββββ βββββββββΌββββββββ
β Local FS β β S3 / GCS / β β Future β
β (fsnotify) β β Azure Blob β β backends β
βββββββββββββββ βββββββββββββββββ βββββββββββββββββ
Four invariants do most of the work:
- Read-only sources β StrataFS never writes back. All state lives in
.stratafs/. - Per-source SQLite β no central registry, no shared bottleneck.
- Compression-aware schema β gzip above 512 bytes, transparent at query time. 40β60% disk savings.
- Soft delete β files disappear consistently, historical queries are free.
Long version: Architecture overview Β· Database internals.
import requests
r = requests.get("http://localhost:8080/search",
params={"q": "feature flag rollout"})
for hit in r.json()["results"]:
print(hit["file_path"], hit["relevance_score"])const res = await fetch("http://localhost:8081/mcp/tools/call", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
tool: "search",
parameters: { query: "rate limiting", max_results: 5 }
}),
});stratafs search "deployment strategy" --mode hybrid --limit 5 --jsonimport "github.com/neul-labs/stratafs/pkg/search"
eng, _ := search.NewEngine(cfg)
results, _ := eng.Hybrid(ctx, "circuit breaker pattern", search.Opts{Limit: 10})Measured on consumer hardware (M-series Mac, NVMe SSD, BGE Base EN v1.5).
| Metric | Typical value |
|---|---|
| Indexing throughput | 50 β 100 files/sec |
| Search latency (10 k files) | < 100 ms |
| Disk overhead | ~1.5β2Γ original text (with compression) |
| Memory baseline | ~200 MB + model (~500 MB for BGE Base) |
| Cold start | < 1 s |
Performance tuning, model swaps, and benchmark methodology: Performance guide.
Every moving part is a registry plus an interface. Adding things is intentionally boring.
Add a new file parser
// pkg/parsers/asciidoc.go
type AsciidocParser struct{}
func (p *AsciidocParser) Parse(r io.Reader) (string, error) { /* ... */ }
func (p *AsciidocParser) SupportedExtensions() []string {
return []string{".adoc", ".asciidoc"}
}
func init() { DefaultRegistry.Register(NewAsciidocParserFactory()) }Add a new storage backend
// pkg/filesystem/dropbox.go
type DropboxFS struct{ /* ... */ }
func (fs *DropboxFS) Open(path string) (io.ReadCloser, error) { /* ... */ }
func (fs *DropboxFS) Walk(root string, fn WalkFunc) error { /* ... */ }
// pkg/storage/factory.go
case config.StorageTypeDropbox:
return f.createDropboxFS(source)Add a new chunking strategy
// pkg/chunking/ast.go
type ASTChunker struct{}
func (c *ASTChunker) Name() string { return "ast" }
func (c *ASTChunker) ChunkStream(r io.Reader, o ChunkOptions) (<-chan Chunk, <-chan error) {
// Yield one chunk per top-level AST node.
}Swap the embedding model
{
"embedding": {
"model": "bge-small-en-v1.5",
"dimension": 384
}
}Any ONNX-compatible model works. Drop the weights in ~/.stratafs/fastembed_cache/ and point embedding.model at it.
Add a new ranking signal
Hybrid scoring is a single SQL query with weighted CTEs. Add a CTE, expose a weight, ship a PR. Full walkthrough in the development guide.
- Enterprise security β RBAC for authentication and source-level permissions
- Streaming search results β chunked HTTP for very large result sets
- Custom embeddings β first-class support for any ONNX-compatible model on disk
- Cross-source ranking signals β per-source weight, recency boost, trusted-source pinning
- Encrypted source databases β SQLCipher-backed at-rest encryption
Already shipped: virtual FS export, FUSE/WinFsp mount, GNOME / Spotlight / Windows Search integration, Wails desktop UI, native installers for every desktop OS, enterprise connectors for SharePoint / Google Drive / Jira.
Full list: Roadmap.
The full docs live in documentation/ and are built with MkDocs Material.
| Topic | Where |
|---|---|
| Getting started | documentation/docs/getting-started/ |
| User guide (config, search, CLI, file types) | documentation/docs/user-guide/ |
| REST + MCP integration | documentation/docs/ai-integration/ |
| Storage backends | documentation/docs/user-guide/storage-backends.md |
| Deployment (Docker / systemd / launchd / K8s) | documentation/docs/deployment/ |
| Architecture | documentation/docs/architecture/ |
| Contributing & dev setup | documentation/docs/contributing/ |
Preview the docs locally:
cd documentation
pip install -r requirements.txt
mkdocs serve- Issues β github.com/neul-labs/stratafs/issues
- Discussions β github.com/neul-labs/stratafs/discussions
- Contributing guide β documentation/docs/contributing/development.md
Pull requests welcome. For larger changes, open an issue first to align on the approach. Every PR runs the full test suite plus a Docker build in CI.
MIT. Do whatever you want with it. If StrataFS ends up powering something interesting, we'd love to hear about it.