diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
new file mode 100644
index 0000000..a082a45
--- /dev/null
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,57 @@
+name: CI
+
+on:
+ push:
+ branches: [main]
+ pull_request:
+
+concurrency:
+ group: ${{ github.workflow }}-${{ github.ref }}
+ cancel-in-progress: true
+
+jobs:
+ lint-and-test:
+ name: lint + test (py${{ matrix.python-version }})
+ runs-on: ubuntu-latest
+ strategy:
+ fail-fast: false
+ matrix:
+ python-version: ["3.11", "3.12"]
+ steps:
+ - uses: actions/checkout@v4
+ - uses: actions/setup-python@v5
+ with:
+ python-version: ${{ matrix.python-version }}
+ cache: pip
+ - name: Install package + dev tools
+ run: |
+ python -m pip install --upgrade pip
+ pip install -e .
+ pip install ruff mypy pyright pytest
+ - name: Ruff lint
+ run: ruff check .
+ - name: Ruff format check
+ run: ruff format --check .
+ - name: mypy
+ run: mypy src/scperteval
+ - name: pyright
+ run: pyright src/scperteval
+ - name: pytest
+ run: pytest -q
+
+ docs:
+ name: docs build
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v4
+ - uses: actions/setup-python@v5
+ with:
+ python-version: "3.12"
+ cache: pip
+ - name: Install package + docs deps
+ run: |
+ python -m pip install --upgrade pip
+ pip install -e .
+ pip install --group docs
+ - name: Build HTML docs
+ run: sphinx-build -b html -n docs docs/_build/html
diff --git a/.gitignore b/.gitignore
index 2fd5a41..b84bb62 100644
--- a/.gitignore
+++ b/.gitignore
@@ -7,6 +7,11 @@ dist/
.venv/
venv/
+# Environment
+uv.lock
+.envrc
+requirements-local.txt
+
# Run outputs
results/
@@ -14,3 +19,7 @@ results/
.idea/
.vscode/
.DS_Store
+
+# docs
+docs/generated/
+docs/_build/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 0000000..2e7c9e3
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,20 @@
+# Run `pre-commit install` once; hooks then run on every commit.
+# Update pinned revs with `pre-commit autoupdate`.
+repos:
+ - repo: https://github.com/astral-sh/ruff-pre-commit
+ rev: v0.15.13
+ hooks:
+ - id: ruff
+ args: [--fix]
+ - id: ruff-format
+
+ - repo: https://github.com/pre-commit/pre-commit-hooks
+ rev: v5.0.0
+ hooks:
+ - id: end-of-file-fixer
+ - id: trailing-whitespace
+ - id: check-yaml
+ - id: check-toml
+ - id: check-merge-conflict
+ - id: check-added-large-files
+ args: [--maxkb=1024]
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
new file mode 100644
index 0000000..fe7f5fe
--- /dev/null
+++ b/.readthedocs.yaml
@@ -0,0 +1,16 @@
+# https://docs.readthedocs.io/en/stable/config-file/v2.html
+version: 2
+build:
+ os: ubuntu-24.04
+ tools:
+ python: "3.12"
+ jobs:
+ create_environment:
+ - asdf plugin add uv
+ - asdf install uv latest
+ - asdf global uv latest
+ build:
+ html:
+ - uv sync --group docs
+ - uv run sphinx-build -M html docs docs/_build -W
+ - mv docs/_build $READTHEDOCS_OUTPUT
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000..e2e763c
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,5 @@
+# Changelog
+
+## 0.1.0 (unreleased)
+
+Initial implementation of scPertEval.
diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
index 32fbed5..37ab212 100644
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -7,9 +7,9 @@ welcome. There are two paths, depending on what you're changing.
If you're adding a protocol (a new metric, or a new combination of an existing metric with
a space / centering / controls), **open a PR directly.** This is the common case and the
-whole point of the project. See [Create a protocol](README.md#create-a-protocol) for the
-two-step pattern (a pure function in `scperteval/protocols/algorithms.py` plus a row in
-`scperteval/protocols/table.py`). Adding a new building block (feature space, DE method, control
+whole point of the project. See [Create a protocol](https://github.com/Virtual-Cell-Research-Community/scPertEval/blob/main/docs/protocols.md#create-a-protocol) for the
+two-step pattern (a pure function in `src/scperteval/protocols/metrics.py` plus a row in
+`src/scperteval/protocols/table.py`). Adding a new building block (feature space, DE method, control
source, calibrator) the same way is also welcome as a PR.
Please include:
diff --git a/README.md b/README.md
index 625359f..46eda33 100644
--- a/README.md
+++ b/README.md
@@ -1,504 +1,42 @@
# scPertEval — Evaluation Protocols for Perturbation Sequencing
scPertEval is a command-line tool for **experimenting with and sharing reference implementations of
-evaluation protocols** in single-cell perturbation studies.
+evaluation protocols** in single-cell perturbation studies. The same catalog of protocols backs
+three commands: **`score`** (score a model's predictions against ground truth), **`calibrate`**
+(calibrate a protocol against empirical positive/negative controls per perturbation, reporting the
+**Dynamic Range Fraction (DRF)** and **Bound Discrimination Score (BDS)**), and **`de`** (export
+per-gene differential expression).
-Evaluating predictions across a dataset's
-perturbations reduces to a single question: how different is one group of cells from another? To answer this, an **evaluation protocol** is defined: a specific formulation of a metric, along with some representation of the perturbation data fed to the metric. However, there are a multitude of possibilities -- many already reflected in the literature -- and it can be challenging to compare and contrast protocols across the field and ultimately choose the right approach for a given dataset and problem space.
+Our accompanying publication: TODO_LINK_HERE
-scPertEval renders each protocol as a short, readable building block to run, read, reuse, and contribute back -- a place for
-collaboration and alignment in the field.
-
-The same catalog of protocols backs three commands, each a different use case:
-
-- **`score`** — score a model's predictions against ground truth. Each protocol's metric is
- applied to your **predicted** cells vs the **real** cells, one score per perturbation — the
- conventional "how good is my prediction" evaluation (see
- [Scoring predictions](#scoring-predictions-against-ground-truth)).
-- **`calibrate`** — calibrate a protocol against empirical positive/negative controls built from
- the dataset itself, reporting the **Dynamic Range Fraction (DRF)** and the **Bound Discrimination
- Score (BDS)** — quantifying how well the protocol separates real perturbation signal from an
- uninformative baseline (see [How calibration works](#how-calibration-works)). Use this to decide
- whether a metric is trustworthy in the first place.
-- **`de`** — export per-gene differential expression (statistic + adjusted p) to HDF5, since DE
- is tightly coupled with several protocols.
-
-Our accompanying publiciation: TODO_LINK_HERE
+**→ Full documentation at **
## Install
```bash
-pip install -e . # provides the `scperteval` command
+pip install scperteval
```
-## Input data
-
-scPertEval reads one preprocessed AnnData (`.h5ad`) per dataset. Only three things are required:
-
-- **`adata.X`** — normalized expression, cells × genes (e.g. `sc.pp.normalize_total` + `sc.pp.log1p`); sparse or dense float.
-- **`adata.obs["perturbation"]`** — the perturbation label for each cell; control cells use the label `"control"`. Both names are configurable (`--perturbation-key` / `--control-label`).
-- **`adata.var_names`** — gene identifiers, used as the DEG labels.
-
-Perturbations with at least `--min-cells` cells (default 30) are evaluated. Nothing else is
-needed — references, DE, and PCA are all recomputed in memory, so no `uns`/`obsm`/`layers` are read.
-
-**Sample datasets.** Seven preprocessed perturbation datasets live in a public, read-only GCS
-bucket and serve as a template for the format above:
+Or from this repo:
```bash
-gsutil ls gs://scperteval/processed/ # wessels23, replogle22{k562,rpe1}, nadig25{hepg2,jurkat}, arch1, kaden25rpe1
-gsutil cp gs://scperteval/processed/wessels23_processed_complete.h5ad .
+pip install "scperteval @ git+https://github.com/Virtual-Cell-Research-Community/scPertEval.git"
```
-No gcloud account is needed — each file is also reachable over plain HTTPS at
-`https://storage.googleapis.com/scperteval/processed/_processed_complete.h5ad`.
-
-## Run it
+## Quick start
```bash
-# protocols by name — including parameterised ones (set k / padj per protocol)
-scperteval calibrate data/wessels23.h5ad -p pearson_ctrl,unbiased_mmd_median_pca_k=20,de_overlap_k=10 --de-method t-test
-
-# a parameterised protocol with no value uses its default (k=50, padj=0.05)
-scperteval calibrate data/wessels23.h5ad -p unbiased_mmd_median_top_k --de-method MWU
-
-# a whole group, or everything (parameterised protocols use their defaults)
-scperteval calibrate data/wessels23.h5ad -p distributional --de-method MWU
+# calibrate protocols against built-in controls (DRF/BDS)
scperteval calibrate data/wessels23.h5ad -p all --de-method t-test
-# DRF calibration only (compute DRF only; exclude BDS)
-scperteval calibrate data/wessels23.h5ad -p pearson_ctrl --de-method t-test --output drf
-
-# SCORE predictions against ground truth — predicted cells vs real cells, per protocol.
-# predictions.h5ad must have the same genes and perturbation labels as the dataset.
-scperteval score data/wessels23.h5ad predictions.h5ad -p pearson,mse,de_auprc --de-method t-test
-
-# DE only — writes per-gene statistic + adjusted p to HDF5 (no protocol calibration)
-# Provided as a convenience, since DE methods are tightly coupled with some evaluation protocols
-scperteval de data/wessels23.h5ad --methods MWU
+# score a model's predictions against ground truth
+scperteval score data/wessels23.h5ad predictions.h5ad -p all
-# discover what's available
-scperteval list protocols # also: de-methods | spaces | sources | calibrators
+scperteval list protocols # also: de-methods | spaces | sources | calibrators
```
-Each command prints a summary table and writes a per-perturbation CSV named
-`____