A Model Context Protocol (MCP) server that provides programmatic access to the Digital Research Alliance of Canada's technical documentation. This server mirrors the documentation from the MediaWiki site and exposes it through MCP resources and tools for use with MCP-compatible clients.
- Documentation Mirroring: Syncs documentation from the Alliance MediaWiki site
- MCP Resources: Exposes individual documentation pages as MCP resources
- Full-Text Search: Whoosh-backed content and title search with highlights and scoring
- Related Pages: Embeddings-backed related-page discovery with heuristic fallback
- Search & Query Tools: Provides search, categorization, and querying capabilities
- Startup Refresh: Container entrypoint triggers an incremental sync on boot; schedule additional runs as needed
- Markdown Storage: Stores documentation as markdown files with metadata
- Python 3.11+
- uv for package management
-
Clone and setup the repository:
git clone <repository-url> cd alliance-docs-mcp
-
Install dependencies:
uv sync
-
Configure environment (optional): Create a
.envfile (or export the variables directly) if you want to override defaults. For example:MEDIAWIKI_API_URL=https://docs.alliancecan.ca/mediawiki/api.php DOCS_DIR=./docs USER_AGENT=AllianceDocsMCP/1.0
-
Initial documentation sync:
uv run python scripts/sync_docs.py
Note: Docker images built from this repository automatically run this full sync during the image build so containers start with a warm cache.
-
Start the MCP server:
uv run python -m alliance_docs_mcp.server
The server exposes documentation pages as MCP resources:
- Resource URI:
alliance-docs://page/{slug} - Content: Markdown content of the documentation page
Example:
alliance-docs://page/technical_documentation
The server provides several tools for querying documentation:
search_docs(query: str, category: Optional[str] = None, limit: int = 20, search_content: bool = True, fuzzy: bool = False)
Search documentation pages by title (fallback) or full-text index when available. Full-text results include relevance scores and highlighted snippets.
Parameters:
query: Search query stringcategory: Optional category filterlimit: Maximum number of resultssearch_content: Use full-text index when available (default: True)fuzzy: Enable fuzzy matching for typo tolerance (full-text only)
Returns: List of matching pages with metadata, highlights, and scores (when indexed)
List all available documentation categories.
Returns: List of category names
Find a specific page by its title.
Parameters:
title: Page title to search for
Returns: Page metadata or None if not found
List recently updated pages.
Parameters:
limit: Maximum number of pages to return
Returns: List of recent pages with metadata
Get detailed information about a specific page.
Parameters:
slug: Page slug
Returns: Detailed page information including metadata
List all available documentation pages.
Returns: List of all pages with basic metadata
Embeddings-backed related-pages helper (Chroma + sentence-transformers) with automatic fallback to lightweight heuristics.
Parameters:
slug: Source page sluglimit: Max related pages to returnmin_score: Optional similarity threshold when embeddings are available
Returns: List of related pages with similarity scores (or heuristic scores when falling back)
The server provides reusable prompt templates that guide LLMs on how to effectively query and use the documentation system. These prompts can be used by MCP clients to structure queries and improve consistency.
Guide for effectively searching Alliance documentation. Provides instructions on using the search_docs tool, interpreting search results, and filtering by category.
Parameters:
query: The user's search querycategory: Optional category filter
Use Case: When an LLM needs to help a user search for documentation on a specific topic.
Template for answering technical questions using documentation. Guides the LLM through searching, reading relevant pages, finding related content, and synthesizing information.
Parameters:
question: The technical question to answercontext: Additional context about what the user is trying to accomplish
Use Case: When an LLM needs to answer technical questions based on the documentation.
Guide for exploring documentation by category. Helps discover pages within a specific category and understand the documentation structure.
Parameters:
category: The category to explorepurpose: What the user is trying to accomplish
Use Case: When an LLM needs to help users explore documentation in a specific category (e.g., "Getting Started", "Technical Reference").
Guide for finding related documentation pages. Provides instructions on using the find_related_pages tool and interpreting similarity scores.
Parameters:
topic: The topic or page slug to find related content forgoal: The user's goal (learning, troubleshooting, etc.)
Use Case: When an LLM needs to help users discover related documentation after finding a relevant page.
Template for helping new users get started. Guides LLMs to point users to getting started documentation and common first steps.
Parameters:
use_case: What the user wants to do (e.g., "set up account", "run first job", "install software")
Use Case: When an LLM needs to help new users with onboarding and initial setup tasks.
Run a full synchronization (with rich progress bars and visual feedback):
uv run python scripts/sync_docs.pyRun an incremental sync (only changed pages):
uv run python scripts/sync_docs.py --incrementalIndex controls:
uv run python scripts/sync_docs.py --rebuild-index # Rebuild Whoosh index
uv run python scripts/sync_docs.py --no-index # Skip indexing
uv run python scripts/sync_docs.py --index-dir /tmp/idx # Custom index location
uv run python scripts/sync_docs.py --rebuild-related-index # Rebuild related-page embeddings
uv run python scripts/sync_docs.py --no-related-index # Skip related-page embeddings
uv run python scripts/sync_docs.py --related-index-dir /tmp/rel# Custom related index location
uv run python scripts/sync_docs.py --related-model-name all-MiniLM-L6-v2The related-page index downloads the configured sentence-transformer model (default: all-MiniLM-L6-v2, ~90 MB) the first time it runs.
For FastMCP Cloud deployments, run one of the sync commands above locally and commit the updated docs/ directory before pushing so the hosted server always mirrors the latest content.
The sync script provides:
- Colored output with rich formatting
- Progress bars for download and processing phases
- Real-time statistics including pages/second
- Summary table with detailed metrics
- Error tracking with warnings for failed pages
Note: Markdown pages larger than 10 MB are stored as
.md.gzfiles. The server automatically decompresses them at runtime, so no additional configuration is required.
The sync process automatically generates two files for LLM consumption:
docs/llms.txt: A simple directory listing all page names, categories, and URLs (~35 KB)docs/llms_full.txt.gz: Complete documentation content in a single compressed file (~2.6 MB compressed, ~393 MB uncompressed)
These files are regenerated on every sync (both full and incremental) and committed to the repository, making it easy for LLMs to access the entire documentation corpus.
Set up a cron job for weekly updates:
# Add to crontab (runs every Sunday at 2 AM)
0 2 * * 0 cd /path/to/alliance-docs-mcp && uv run python scripts/sync_docs.py --incrementalThis repository also ships with .github/workflows/weekly-sync.yml, which performs the same incremental sync on Sundays using GitHub Actions and pushes any changes back to main.
Set the following environment variables (via .env, shell exports, or your hosting platform's secret manager) to customize behavior:
MEDIAWIKI_API_URL(defaulthttps://docs.alliancecan.ca/mediawiki/api.php)DOCS_DIR(default./docs, or/data/docsin the container)USER_AGENT(defaultAllianceDocsMCP/1.0)SEARCH_INDEX_DIR(optional; overrides defaultDOCS_DIR/search_index)DISABLE_SEARCH_INDEX(set to1/true/yesto force title-only fallback)RELATED_INDEX_DIR(optional; overrides defaultDOCS_DIR/related_index)RELATED_MODEL_NAME(sentence-transformer model, defaultall-MiniLM-L6-v2)RELATED_BACKEND(defaultchroma)DISABLE_RELATED_INDEX(set to1/true/yesto skip related-page embeddings)
The MCP server can be configured with command-line arguments:
uv run python -m alliance_docs_mcp.server --helpOptions:
--host: Host to bind to (default: localhost)--port: Port to bind to (default: 8000)--docs-dir: Documentation directory (default: ./docs)
The provided Docker image ships with a pre-synced documentation cache baked into /app/docs_seed. When the container starts, the entrypoint primes the configured DOCS_DIR from this seed (if empty) and then launches the MediaWiki sync in the background so the MCP server begins accepting connections immediately. You can configure startup behavior with:
RUN_SYNC_ON_START=0to skip the background sync (useful when running in read-only environments)SYNC_MODE=fullto force a full resync instead of the default incremental sync- The container starts the server via
fastmcp run server_entrypoint.py:mcp --transport http --path /mcp/ --port 8080, so any additional FastMCP CLI flags can be injected by overridingCMDin your own image if needed. - A lightweight
/healthendpoint is exposed for platform probes; point load balancer checks there instead of MCP protocol paths.
alliance-docs-mcp/
├── src/
│ └── alliance_docs_mcp/
│ ├── __init__.py
│ ├── server.py # FastMCP server implementation
│ ├── mirror.py # MediaWiki API client
│ ├── converter.py # WikiText to Markdown converter
│ └── storage.py # File storage and retrieval
├── docs/ # Mirrored markdown files
│ ├── pages/ # Organized by category
│ └── index.json # Page metadata index
├── scripts/
│ └── sync_docs.py # Synchronization script
├── tests/ # Test files
├── pyproject.toml # Project configuration
└── README.md
uv run pytestuv run black src/
uv run ruff check src/FastMCP Cloud (managed)
- Sign in at fastmcp.cloud with your GitHub account and create a project that points at this repository.
- Use
server_entrypoint.py:mcpas the entrypoint so the platform runs the exported FastMCP server instance. - Configure environment variables (e.g.,
MEDIAWIKI_API_URL,DOCS_DIR,USER_AGENT) via the project settings; the service installs dependencies directly frompyproject.toml. - Push to
mainto trigger deployments; each pull request automatically gets its own preview environment for testing changes.
Self-managed container/VM
- Build the Docker image in this repo and run it anywhere that can expose HTTP on port
8080. - Provide the same environment variables via your scheduler or container runtime.
- Point load balancer health checks at
/healthand connect MCP clients to the/mcp/path served byfastmcp run.
- New MCP Tools: Add new tool functions to
server.py - Storage Enhancements: Extend
storage.pyfor new functionality - API Improvements: Modify
mirror.pyfor different API interactions
- Sync Failures: Check API access and network connectivity
- Missing Pages: Verify MediaWiki API responses
- Conversion Errors: Ensure
beautifulsoup4/wikitextparserare installed and valid HTML is being stripped (use--no-strip-htmlto disable)
Check the sync.log file for synchronization issues:
tail -f sync.logRun with verbose logging:
uv run python scripts/sync_docs.py --verbose- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Digital Research Alliance of Canada for providing the documentation
- FastMCP for the MCP server framework
- uv for Python package management