Microscope Toolset is a research platform that connects a Claude Code AI agent to a real or virtual microscope through an MCP server embedded in a napari GUI. The agent can control real or simulated hardware, execute image-analysis code, query a curated knowledge database, and track complete experiment sessions — all from a natural-language prompt.
- MCP server — exposes microscope control, image acquisition, and analysis as Claude-callable tools over HTTP
- Napari plugin — control panel to start/stop the server, monitor hardware state, and review sessions
- Hardware abstraction — supports real hardware via pymmcore-plus / pymmcore-proxy and simulated hardware via virtual-microscope
- Code execution with guardrails — agents submit Python code that is checked (AST + runtime) for unsafe patterns before running; built-in guards for
CMMCorePlusmisuse and Cellpose parameter ranges - Self-learn loop — library of ready-made analysis modules (cell detection, tracking, morphometry, workflows) the agent can invoke or learn from
- Benchmarking — simulation-based evaluation of agent performance across reproducible virtual-microscope scenarios
- Experiment tracking — captures every Claude Code turn, tool call, and result into a timestamped folder for offline review and replay
⚠️ Security & Liability WarningThis toolset gives an LLM agent direct access to your computer: it can execute arbitrary Python code, read and write files, control microscope hardware, and install packages into your Python environment.
You are responsible for reviewing every action the agent proposes before approving it. In particular:
- The
install_packagestool will install packages directly into your active conda/uv environment. Only approve packages you recognise and trust.- The
execute_python_codetool runs code with the same privileges as your user account.The author(s) of this project accept no responsibility for any damage, data loss, security incident, or unintended hardware interaction that may result from using this software. Use it at your own risk.
Clone the repository first:
git clone https://github.com/ddd42-star/microscope-toolset.git
cd microscope-toolsetThen set up a Python 3.12 environment using either conda or uv:
conda create -n microscope-toolset python=3.12
conda activate microscope-toolset
pip install -r requirements.txt
pip install -e .Install uv if you don't have it yet:
pip install uvCreate the environment:
uv venv --python 3.12
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activateInstall from pyproject.toml (recommended with uv):
uv sync # installs all dependencies defined in pyproject.toml
uv sync --extra dev # also includes pytest, pre-commit, coverage, …uv sync reads [tool.uv.sources] in pyproject.toml to resolve virtual-microscope and pymmcore-proxy from git automatically. If you have local clones of those packages in the project folder, switch the relevant lines in [tool.uv.sources] to path sources — see the comments in pyproject.toml.
Alternative — install from requirements.txt (same as the conda path):
uv pip install -r requirements.txt
uv pip install -e .PyTorch + CUDA: uv installs the CPU-only build of torch by default. To enable CUDA support, uncomment the
torchindex entry in[tool.uv.sources]insidepyproject.tomland set your CUDA version (cu118,cu121,cu124, …) before runninguv sync.
The toolset uses two models, both configurable via the .env file:
1. Claude (Anthropic) — for query reformulation inside the database agent
Set your API key and choose a model:
ANTHROPIC_API_KEY="your-anthropic-api-key-here"
ANTHROPIC_MODEL="claude-haiku-4-5-20251001"
| Model | Speed | Quality | Cost |
|---|---|---|---|
claude-haiku-4-5-20251001 |
Fast | Good — recommended for query reformulation | Low |
claude-sonnet-4-6 |
Medium | Higher quality | Medium |
claude-opus-4-7 |
Slow | Best quality | High |
2. Sentence-transformers — for embedding queries into Elasticsearch KNN search
EMBED_MODEL="BAAI/bge-small-en-v1.5"
| Model | Dimensions | Size | Quality |
|---|---|---|---|
all-MiniLM-L6-v2 |
384 | ~80 MB | Fast / lightweight |
BAAI/bge-small-en-v1.5 |
512 | ~120 MB | Good — default |
all-mpnet-base-v2 |
768 | ~420 MB | Better |
BAAI/bge-base-en-v1.5 |
768 | ~420 MB | Better |
BAAI/bge-large-en-v1.5 |
1024 | ~1.2 GB | Best local quality |
Important: the embedding dimension must match your Elasticsearch KNN index. If you change
EMBED_MODELyou must re-index all your Elasticsearch data.
The model is downloaded automatically on first run.
Run the following command for starting the Napari GUI
python -m src.plugin_napari
On the right there is the panel control that will start or stop the MCP Microscope Toolset server.
Add the mcp server to you claude code account
$ claude code add --transport http microscope http://127.0.0.1:5500/mcp
Then goes in /mcp
$ /mcp + enter
And select the microscope MCP server either connecting the server or enabling the server, and from the terminal whery you started the napari-plugin you will see if the server correctly connected.
After you added the mcp.json configuration file, you can start the MCP Client that will connect to the server.
The execute_python_code tool runs agent-submitted Python code through a series of safety and correctness checks before execution:
General checks (all code):
- Blocks re-instantiation of
CMMCorePlus/UniMMCore— the pre-configuredmmcinstance must be used - Blocks
.loadSystemConfiguration()/.loadConfig()— hardware configuration is managed by the toolset - Blocks any reference to
viewerornapari.current_viewer()— GUI access must go through the dedicatedviewer_*tools - Auto-detects missing packages before execution and surfaces them for user approval
Library-specific guardrails (cellpose):
When cellpose is imported the following checks are enforced:
| Check | What it blocks | Why |
|---|---|---|
diameter required |
Calls without an explicit diameter kwarg | Cellpose defaults to 30 px — wrong for most microscopy samples |
channels required |
Calls without explicit channel assignment | Default [0,0] fails silently on multichannel fluorescence images |
flow_threshold range |
Literal values outside [0.0, 3.0] |
Values > 1.0 accept noise as cells; < 0.0 misses real cells |
cellprob_threshold range |
Literal values outside [-6.0, 6.0] |
Extreme values cause silent over/under-segmentation |
| Image size (runtime) | Images larger than 512×512 px | Large images make segmentation very slow |
If an image exceeds 512×512, the agent receives an error with a ready-to-use resize + mask rescale snippet (using cv2.INTER_NEAREST to avoid blending label IDs). To opt out and use the original image size, set cellpose_allow_large_image = True in the code before the eval call.
Adding guardrails for other libraries:
The guardrail system is extensible. Static (AST-based) guards and runtime (monkey-patch) guards can be registered for any library from any module:
Execute.register_library_guard("mylib", my_ast_guard_fn) # runs before exec
Execute.register_runtime_guard("mylib", my_installer_fn) # runs during execIf you need a guardrail for a library that is not yet covered, please open an issue or submit a PR.
Run the full test suite with:
python -m pytest test/ -vMost tests run without any additional setup. The table below lists every test module and what it covers:
| Module | What it tests | Needs real HW? |
|---|---|---|
test_classify_cfg.py |
classify_cfg — detects virtual / real / mixed cfg files |
No |
test_execute.py |
Python code safety guards, import validation, cellpose guardrails | No |
test_core_proxy_worker.py |
CoreProxyWorker signal wiring, mixed-cfg rejection, full proxy start with DemoCamera |
No |
test_all_signals.py |
Full signal coverage of RemoteMMCore against a virtual microscope |
Opt-in (see below) |
test_init.py |
Basic package import smoke tests | No |
Two tests spin up a real virtual microscope backend and are skipped by default to keep the standard run fast:
test_core_proxy_worker.py::test_full_server_start_virtual— starts a proxy againstbacteria.cfg(virtual)- All tests in
test_all_signals.py— full signal coverage usingparticle.cfg(virtual)
To enable them, set VIRTUAL_MICROSCOPE_TESTS=1:
# Windows
$env:VIRTUAL_MICROSCOPE_TESTS = "1"
python -m pytest test/ -v
# macOS / Linux
VIRTUAL_MICROSCOPE_TESTS=1 pytest test/ -vThese tests also require the virtual-microscope package to be installed. If you used uv sync it is included automatically (resolved from the git source in [tool.uv.sources]). If you are on a conda environment, install it manually:
pip install git+https://github.com/hinderling/virtual-microscopeThe virtual-microscope tests also expect the bacteria.cfg and particle.cfg configuration files to be present under virtual-microscope/src/virtual_microscope/backends/. These are included in the cloned repository so no extra step is needed if the git source was used.
The toolset includes a simulation-based benchmarking system for evaluating agent performance using the knowledge database from the self-learn-loop. Each benchmark test is a self-contained virtual microscope scenario served to the agent over HTTP — the agent cannot see the ground truth or simulation configuration.
Tests are launched from the Benchmarking panel in the MCPServer GUI, or from the CLI:
# Start a test server on port 5602
python -m src.benchmarking.test_server test_1 --port 5602
# List all available tests
python -m src.benchmarking.test_runnerTo add a new test, see Benchmark Tests.
Experiment Tracking records a complete snapshot of a Claude Code agent session — every user turn, tool call, and result — so you can replay or review what happened after the experiment ends.
When you click Start Tracking in the GUI (or call start_experiment() programmatically), the toolset notes the current line position in the active Claude Code session file. When you click Stop, it extracts everything added since that point and saves it to a timestamped folder.
Each saved experiment folder contains:
src/benchmarking/experiments/<name>_<timestamp>/
│
├── conversation.jsonl # All Claude Code turns captured during the experiment
│ # (user messages, agent responses, tool calls + results)
│
├── workspace/ # Folder pre-created for the agent to save outputs:
│ # images, CSV files, analysis results, figures, etc.
│
└── session_data/ # Copy of the Claude Code session subdirectory
├── tool-results/<id>.txt # Large tool outputs offloaded from the JSONL
├── subagents/<id>.jsonl # Full conversation of each spawned subagent
└── subagents/<id>.meta.json # Subagent type and description metadata
From the GUI — use the Experiment Tracking panel in the control widget. Enter a name and click Start Tracking; click Stop when the experiment is done. The Open button opens the saved folder in the file explorer.
From the CLI (useful when running without a GUI or during benchmarking):
# Start (creates the experiment folder immediately)
python -m src.benchmarking.experiment_saver start "my_experiment"
# Stop and save the conversation slice
python -m src.benchmarking.experiment_saver end
# List all saved experiments
python -m src.benchmarking.experiment_saver list
# Check whether an experiment is currently active
python -m src.benchmarking.experiment_saver statusPass the path to a saved conversation.jsonl to the napari launcher to open the interactive dashboard:
python -m src.plugin_napari --review src/benchmarking/experiments/<name>/conversation.jsonlThe dashboard shows a full timeline of the session: user messages, agent reasoning, every tool call with its inputs and outputs, hardware events from the microscope log, subagent conversations, estimated token cost, and duration.
You can optionally merge in the pymmcore-plus hardware log for a combined view of software and hardware events:
python -m src.plugin_napari --review <path_to_conversation.jsonl> --log <path_to_pymmcore-plus.log>- Fix use of Elasticsearch and PostgresSQL database
- Add summary of Claude Code Agent session
- Calculate some Analysis insight as: nb tokens, duration, final code, whole conversation between user and agent. Was planning to do it on a jupyter notebook but if a better way exists then lets implement it
- Add the possibility to use a remote core. This will replace the part of executing it on the mcp tool execution code, but for other image analysis, will need to stay.
- Plan the experiments to do on the real microscope to show the train & untrained Agent.
- Plan to create additional metadata from the microscope session
- Build a chatbox for visualising user-agent conversation, including time, tool calls, ect.
- Switch local virtual simulation to virtual simulation from the package virtual_microscope
- Add
console_scriptsentry point so the toolset can be launched withmicroscope-toolsetinstead ofpython -m src.plugin_napari(add[project.scripts]topyproject.tomland wrap startup in amain()function) - Switch off mcp tool to run python tool and instead use local env from claude
- Introduce Claude Sandbox or Sandobx
