Complete guide to configuring Almanac for local development with Docker containers.
All configuration is done through environment variables in packages/server/.env.
cd packages/server
cp .env.example .env
# Edit .env with your settingsAlmanac uses Docker Compose to run all required services locally. No production/cloud services needed!
# Start only databases (MongoDB, Redis, Qdrant, Memgraph)
docker compose up -d
# Start everything including server and client
docker compose --profile app up -dThe following services run in Docker containers:
- MongoDB (port 27017): Document database
- Redis (port 6379): Cache and session storage
- Qdrant (port 6333): Vector database for embeddings
- Memgraph (port 7687): Graph database for entities and relationships
- Server (port 3000): API server (when using
--profile app) - Client (port 5173): Web UI (when using
--profile app)
All data is persisted in .data/ directory in the project root.
Almanac requires LLM models for three purposes:
- Chat Model: Query understanding, entity extraction, relationship extraction
- Embedding Model: Vector embeddings for semantic search
- Reranking Model (Optional): Improve search result relevance
OpenRouter provides access to many open-source models with a simple API. Get your API key from: https://openrouter.ai/
# Get your API key from: https://openrouter.ai/
LLM_API_KEY=sk-or-v1-...
LLM_BASE_URL=
# Recommended Models
LLM_CHAT_MODEL=fireworks/models/gpt-oss-120b
LLM_EXTRACTION_MODEL=xai/grok-4-1-fast-reasoning
LLM_EMBEDDING_MODEL=openai/Qwen/Qwen3-Embedding-4B
LLM_INDEX_CONFIG_MODEL=fireworks/models/gpt-oss-120b
# Optional: Reranking
RERANKER_ENABLED=true
RERANKER_BASE_URL=https://openrouter.ai/v1/rerank
RERANKER_MODEL=deepinfra/Qwen/Qwen3-Reranker-4BWhy These Models?
- Chat:
fireworks/models/gpt-oss-120b- High quality reasoning for complex queries - Extraction:
xai/grok-4-1-fast-reasoning- Fast and accurate entity/relationship extraction - Embedding:
openai/Qwen/Qwen3-Embedding-4B- Balanced quality and performance for semantic search - Reranker:
deepinfra/Qwen/Qwen3-Reranker-4B- Improved search result relevance
All databases run in local Docker containers. Default configuration works out of the box!
MONGO_HOST=localhost
MONGO_PORT=27017
MONGO_USERNAME=admin
MONGO_PASSWORD=admin123
MONGO_DB_NAME=almanacDocker Service: mongodb (mongo:8.2.2)
Data Location: .data/mongodb/
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
REDIS_DB=0Docker Service: redis (redis:alpine3.22)
Data Location: .data/redis/
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_API_KEY=Docker Service: qdrant (qdrant/qdrant:v1.16.0)
Data Location: .data/qdrant/
Admin UI: http://localhost:6333/dashboard
Vector Sizes by Model:
text-embedding-3-large: 3072text-embedding-3-small: 1536qwen/qwen3-embedding-8b: 1024qwen/qwen3-embedding-4b: 1024nomic-embed-text: 768
MEMGRAPH_HOST=localhost
MEMGRAPH_PORT=7687
MEMGRAPH_USERNAME=
MEMGRAPH_PASSWORD=Docker Service: memgraph (memgraph/memgraph:3.7.0)
Data Location: .data/memgraph/
# Log level: debug, info, warn, error
LOG_LEVEL=info
# MCP Debug Logs (full responses vs. length only)
MCP_DEBUG_LOGS=falseControl parallel operations for better performance:
# Database indexing concurrency
DB_INDEXING_CONCURRENCY=32
# Vector indexing concurrency
VECTOR_INDEXING_CONCURRENCY=32
# Graph extraction concurrency
GRAPH_EXTRACTION_CONCURRENCY=32Adjust based on your system resources:
- High-end systems: 32-64
- Mid-range systems: 16-32
- Low-end systems: 8-16
Required for storing sensitive data (OAuth tokens, API keys):
# Generate a key:
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
# Add to .env:
ENCRYPTION_KEY=your-generated-64-character-hex-string# Maximum entities per document (optional, no limit if not set)
MAX_ENTITIES_PER_DOCUMENT=1000
# Dynamic limit: 1 entity per X characters (optional)
# Lower value = more entities (e.g., 40 = generous)
# Higher value = fewer entities (e.g., 150 = conservative)
# ENTITY_CHARS_PER_ENTITY=40Filters out documents that look like lists, bibliographies, or indexes:
# Disabled by default (may be too aggressive)
ENABLE_TOXIC_DOCUMENT_FILTER=falseDetection heuristics: >50 entities, <5 relationships, <20 char avg entity names
Almanac uses hybrid scheduling for automatic syncs:
- On startup: Syncs data sources that missed their last scheduled sync
- While running: Syncs all enabled data sources according to cron schedule
# Only sync data modified after this date (default: 60 days ago)
# SYNC_CUTOFF_DATE=2025-12-01T00:00:00.000Z
# Maximum records to sync per data source (optional, no limit by default)
# SYNC_MAX_RECORDS=1000
# Cron schedule for automatic syncs (default: daily at midnight)
# SYNC_CRON_SCHEDULE=0 0 * * *Cron Schedule Examples:
0 * * * *- Every hour0 0 * * *- Every day at midnight (DEFAULT)0 */6 * * *- Every 6 hours0 0 * * 0- Every Sunday at midnight
Note: Times are interpreted in your local server timezone.
For data sources requiring OAuth (Slack, GitHub, Notion), you'll need to:
- Create OAuth apps in each service
- Add credentials to
.env - Configure redirect URIs to point to your local server
These are only needed if you want to connect these specific data sources.
Here's a complete .env file for local development using OpenRouter:
# === Logging ===
LOG_LEVEL=info
MCP_DEBUG_LOGS=false
# === Local Docker Databases ===
# MongoDB
MONGO_HOST=localhost
MONGO_PORT=27017
MONGO_USERNAME=admin
MONGO_PASSWORD=admin123
MONGO_DB_NAME=almanac
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
REDIS_DB=0
# Memgraph
MEMGRAPH_HOST=localhost
MEMGRAPH_PORT=7687
MEMGRAPH_USERNAME=
MEMGRAPH_PASSWORD=
# Qdrant
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_API_KEY=
# === LLM Configuration (OpenRouter) ===
# Get your API key from: https://openrouter.ai/
LLM_API_KEY=sk-or-v1-your-key-here
LLM_BASE_URL=
# Recommended Models
LLM_CHAT_MODEL=fireworks/models/gpt-oss-120b
LLM_EXTRACTION_MODEL=xai/grok-4-1-fast-reasoning
LLM_EMBEDDING_MODEL=openai/Qwen/Qwen3-Embedding-4B
LLM_INDEX_CONFIG_MODEL=fireworks/models/gpt-oss-120b
# === Reranker (Optional) ===
RERANKER_ENABLED=true
RERANKER_BASE_URL=https://openrouter.ai/v1/rerank
RERANKER_MODEL=deepinfra/Qwen/Qwen3-Reranker-4B
# === Security ===
# Generate with: node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
ENCRYPTION_KEY=your-64-character-hex-string-here
# === Performance ===
DB_INDEXING_CONCURRENCY=32
VECTOR_INDEXING_CONCURRENCY=32
GRAPH_EXTRACTION_CONCURRENCY=32
# === Indexing ===
MAX_ENTITIES_PER_DOCUMENT=1000
ENABLE_TOXIC_DOCUMENT_FILTER=false
# === Sync Configuration ===
# SYNC_CUTOFF_DATE=2025-12-01T00:00:00.000Z
# SYNC_MAX_RECORDS=1000
# SYNC_CRON_SCHEDULE=0 0 * * *# All services
docker compose logs -f
# Specific service
docker compose logs -f mongodb
docker compose logs -f server# Stop all services
docker compose down
# Stop and remove volumes (delete all data!)
docker compose down -v# MongoDB shell
docker exec -it almanac_mongodb mongosh -u admin -p admin123
# Redis CLI
docker exec -it almanac_redis redis-cli
# Memgraph shell
docker exec -it almanac_memgraph mgconsole# Check if ports are already in use
lsof -i :27017 # MongoDB
lsof -i :6379 # Redis
lsof -i :6333 # Qdrant
lsof -i :7687 # Memgraph
# Stop conflicting services or change ports in docker-compose.ymlError: Failed to connect to MongoDB
Fix: Ensure Docker containers are running:
docker compose ps
docker compose up -dError: JavaScript heap out of memory
Fix: Reduce concurrency settings:
DB_INDEXING_CONCURRENCY=16
VECTOR_INDEXING_CONCURRENCY=16
GRAPH_EXTRACTION_CONCURRENCY=16If you need to delete and recreate collections:
# Access Qdrant UI
open http://localhost:6333/dashboard
# Or use API
curl -X DELETE http://localhost:6333/collections/almanac- Quick Start - Start using Almanac
- Installation - Detailed setup instructions
- Custom MCP Servers - Add data sources
- Query & Search - Query your data