Microscope Toolset

Microscope Toolset is a research platform that connects a Claude Code AI agent to a real or virtual microscope through an MCP server embedded in a napari GUI. The agent can control real or simulated hardware, execute image-analysis code, query a curated knowledge database, and track complete experiment sessions — all from a natural-language prompt.

Key features

MCP server — exposes microscope control, image acquisition, and analysis as Claude-callable tools over HTTP
Napari plugin — control panel to start/stop the server, monitor hardware state, and review sessions
Hardware abstraction — supports real hardware via pymmcore-plus / pymmcore-proxy and simulated hardware via virtual-microscope
Code execution with guardrails — agents submit Python code that is checked (AST + runtime) for unsafe patterns before running; built-in guards for CMMCorePlus misuse and Cellpose parameter ranges
Self-learn loop — library of ready-made analysis modules (cell detection, tracking, morphometry, workflows) the agent can invoke or learn from
Benchmarking — simulation-based evaluation of agent performance across reproducible virtual-microscope scenarios
Experiment tracking — captures every Claude Code turn, tool call, and result into a timestamped folder for offline review and replay

⚠️ Security & Liability Warning

This toolset gives an LLM agent direct access to your computer: it can execute arbitrary Python code, read and write files, control microscope hardware, and install packages into your Python environment.

You are responsible for reviewing every action the agent proposes before approving it. In particular:

The install_packages tool will install packages directly into your active conda/uv environment. Only approve packages you recognise and trust.

The execute_python_code tool runs code with the same privileges as your user account.

The author(s) of this project accept no responsibility for any damage, data loss, security incident, or unintended hardware interaction that may result from using this software. Use it at your own risk.

How to get started

Clone the repository first:

git clone https://github.com/ddd42-star/microscope-toolset.git
cd microscope-toolset

Then set up a Python 3.12 environment using either conda or uv:

Option A — Conda (Anaconda / Miniconda / Mamba)

conda create -n microscope-toolset python=3.12
conda activate microscope-toolset
pip install -r requirements.txt
pip install -e .

Option B — uv

Install uv if you don't have it yet:

pip install uv

Create the environment:

uv venv --python 3.12
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate

Install from pyproject.toml (recommended with uv):

uv sync           # installs all dependencies defined in pyproject.toml
uv sync --extra dev  # also includes pytest, pre-commit, coverage, …

uv sync reads [tool.uv.sources] in pyproject.toml to resolve virtual-microscope and pymmcore-proxy from git automatically. If you have local clones of those packages in the project folder, switch the relevant lines in [tool.uv.sources] to path sources — see the comments in pyproject.toml.

Alternative — install from requirements.txt (same as the conda path):

uv pip install -r requirements.txt
uv pip install -e .

PyTorch + CUDA: uv installs the CPU-only build of torch by default. To enable CUDA support, uncomment the torch index entry in [tool.uv.sources] inside pyproject.toml and set your CUDA version (cu118, cu121, cu124, …) before running uv sync.

Model configuration

The toolset uses two models, both configurable via the .env file:

1. Claude (Anthropic) — for query reformulation inside the database agent

Set your API key and choose a model:

ANTHROPIC_API_KEY="your-anthropic-api-key-here"
ANTHROPIC_MODEL="claude-haiku-4-5-20251001"

Model	Speed	Quality	Cost
`claude-haiku-4-5-20251001`	Fast	Good — recommended for query reformulation	Low
`claude-sonnet-4-6`	Medium	Higher quality	Medium
`claude-opus-4-7`	Slow	Best quality	High

2. Sentence-transformers — for embedding queries into Elasticsearch KNN search

EMBED_MODEL="BAAI/bge-small-en-v1.5"

Model	Dimensions	Size	Quality
`all-MiniLM-L6-v2`	384	~80 MB	Fast / lightweight
`BAAI/bge-small-en-v1.5`	512	~120 MB	Good — default
`all-mpnet-base-v2`	768	~420 MB	Better
`BAAI/bge-base-en-v1.5`	768	~420 MB	Better
`BAAI/bge-large-en-v1.5`	1024	~1.2 GB	Best local quality

Important: the embedding dimension must match your Elasticsearch KNN index. If you change EMBED_MODEL you must re-index all your Elasticsearch data.

The model is downloaded automatically on first run.

How to start the toolset

Run the following command for starting the Napari GUI

python -m src.plugin_napari

On the right there is the panel control that will start or stop the MCP Microscope Toolset server.

Add the mcp server to you claude code account

$ claude code add --transport http microscope http://127.0.0.1:5500/mcp

Then goes in /mcp

$ /mcp + enter

And select the microscope MCP server either connecting the server or enabling the server, and from the terminal whery you started the napari-plugin you will see if the server correctly connected.

After you added the mcp.json configuration file, you can start the MCP Client that will connect to the server.

Project structure

Execution guardrails

The execute_python_code tool runs agent-submitted Python code through a series of safety and correctness checks before execution:

General checks (all code):

Blocks re-instantiation of CMMCorePlus / UniMMCore — the pre-configured mmc instance must be used
Blocks .loadSystemConfiguration() / .loadConfig() — hardware configuration is managed by the toolset
Blocks any reference to viewer or napari.current_viewer() — GUI access must go through the dedicated viewer_* tools
Auto-detects missing packages before execution and surfaces them for user approval

Library-specific guardrails (cellpose):

When cellpose is imported the following checks are enforced:

Check	What it blocks	Why
`diameter` required	Calls without an explicit diameter kwarg	Cellpose defaults to 30 px — wrong for most microscopy samples
`channels` required	Calls without explicit channel assignment	Default `[0,0]` fails silently on multichannel fluorescence images
`flow_threshold` range	Literal values outside `[0.0, 3.0]`	Values > 1.0 accept noise as cells; < 0.0 misses real cells
`cellprob_threshold` range	Literal values outside `[-6.0, 6.0]`	Extreme values cause silent over/under-segmentation
Image size (runtime)	Images larger than 512×512 px	Large images make segmentation very slow

If an image exceeds 512×512, the agent receives an error with a ready-to-use resize + mask rescale snippet (using cv2.INTER_NEAREST to avoid blending label IDs). To opt out and use the original image size, set cellpose_allow_large_image = True in the code before the eval call.

Adding guardrails for other libraries:

The guardrail system is extensible. Static (AST-based) guards and runtime (monkey-patch) guards can be registered for any library from any module:

Execute.register_library_guard("mylib", my_ast_guard_fn)   # runs before exec
Execute.register_runtime_guard("mylib", my_installer_fn)   # runs during exec

If you need a guardrail for a library that is not yet covered, please open an issue or submit a PR.

Testing

Run the full test suite with:

python -m pytest test/ -v

Most tests run without any additional setup. The table below lists every test module and what it covers:

Module	What it tests	Needs real HW?
`test_classify_cfg.py`	`classify_cfg` — detects virtual / real / mixed cfg files	No
`test_execute.py`	Python code safety guards, import validation, cellpose guardrails	No
`test_core_proxy_worker.py`	`CoreProxyWorker` signal wiring, mixed-cfg rejection, full proxy start with DemoCamera	No
`test_all_signals.py`	Full signal coverage of `RemoteMMCore` against a virtual microscope	Opt-in (see below)
`test_init.py`	Basic package import smoke tests	No

Opt-in: virtual microscope tests

Two tests spin up a real virtual microscope backend and are skipped by default to keep the standard run fast:

test_core_proxy_worker.py::test_full_server_start_virtual — starts a proxy against bacteria.cfg (virtual)
All tests in test_all_signals.py — full signal coverage using particle.cfg (virtual)

To enable them, set VIRTUAL_MICROSCOPE_TESTS=1:

# Windows
$env:VIRTUAL_MICROSCOPE_TESTS = "1"
python -m pytest test/ -v

# macOS / Linux
VIRTUAL_MICROSCOPE_TESTS=1 pytest test/ -v

These tests also require the virtual-microscope package to be installed. If you used uv sync it is included automatically (resolved from the git source in [tool.uv.sources]). If you are on a conda environment, install it manually:

pip install git+https://github.com/hinderling/virtual-microscope

The virtual-microscope tests also expect the bacteria.cfg and particle.cfg configuration files to be present under virtual-microscope/src/virtual_microscope/backends/. These are included in the cloned repository so no extra step is needed if the git source was used.

Benchmarking

The toolset includes a simulation-based benchmarking system for evaluating agent performance using the knowledge database from the self-learn-loop. Each benchmark test is a self-contained virtual microscope scenario served to the agent over HTTP — the agent cannot see the ground truth or simulation configuration.

Tests are launched from the Benchmarking panel in the MCPServer GUI, or from the CLI:

# Start a test server on port 5602
python -m src.benchmarking.test_server test_1 --port 5602

# List all available tests
python -m src.benchmarking.test_runner

To add a new test, see Benchmark Tests.

Experiment Tracking

Experiment Tracking records a complete snapshot of a Claude Code agent session — every user turn, tool call, and result — so you can replay or review what happened after the experiment ends.

What gets recorded

When you click Start Tracking in the GUI (or call start_experiment() programmatically), the toolset notes the current line position in the active Claude Code session file. When you click Stop, it extracts everything added since that point and saves it to a timestamped folder.

Each saved experiment folder contains:

src/benchmarking/experiments/<name>_<timestamp>/
│
├── conversation.jsonl          # All Claude Code turns captured during the experiment
│                               # (user messages, agent responses, tool calls + results)
│
├── workspace/                  # Folder pre-created for the agent to save outputs:
│                               # images, CSV files, analysis results, figures, etc.
│
└── session_data/               # Copy of the Claude Code session subdirectory
    ├── tool-results/<id>.txt   # Large tool outputs offloaded from the JSONL
    ├── subagents/<id>.jsonl    # Full conversation of each spawned subagent
    └── subagents/<id>.meta.json # Subagent type and description metadata

How to start and stop tracking

From the GUI — use the Experiment Tracking panel in the control widget. Enter a name and click Start Tracking; click Stop when the experiment is done. The Open button opens the saved folder in the file explorer.

From the CLI (useful when running without a GUI or during benchmarking):

# Start (creates the experiment folder immediately)
python -m src.benchmarking.experiment_saver start "my_experiment"

# Stop and save the conversation slice
python -m src.benchmarking.experiment_saver end

# List all saved experiments
python -m src.benchmarking.experiment_saver list

# Check whether an experiment is currently active
python -m src.benchmarking.experiment_saver status

Reviewing a saved experiment

Pass the path to a saved conversation.jsonl to the napari launcher to open the interactive dashboard:

python -m src.plugin_napari --review src/benchmarking/experiments/<name>/conversation.jsonl

The dashboard shows a full timeline of the session: user messages, agent reasoning, every tool call with its inputs and outputs, hardware events from the microscope log, subagent conversations, estimated token cost, and duration.

You can optionally merge in the pymmcore-plus hardware log for a combined view of software and hardware events:

python -m src.plugin_napari --review <path_to_conversation.jsonl> --log <path_to_pymmcore-plus.log>

Name		Name	Last commit message	Last commit date
Latest commit History 308 Commits
.github/workflows		.github/workflows
docs		docs
src		src
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
Roadmap.md		Roadmap.md
publications.txt		publications.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microscope Toolset

Key features

How to get started

Option A — Conda (Anaconda / Miniconda / Mamba)

Option B — uv

Model configuration

How to start the toolset

Project structure

Execution guardrails

Testing

Opt-in: virtual microscope tests

Benchmarking

Experiment Tracking

What gets recorded

How to start and stop tracking

Reviewing a saved experiment

TO DO LIST

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Microscope Toolset

Key features

How to get started

Option A — Conda (Anaconda / Miniconda / Mamba)

Option B — uv

Model configuration

How to start the toolset

Project structure

Execution guardrails

Testing

Opt-in: virtual microscope tests

Benchmarking

Experiment Tracking

What gets recorded

How to start and stop tracking

Reviewing a saved experiment

TO DO LIST

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages