Skip to content

TalusBio/helmsdeep-open

Repository files navigation

Helmsdeep

A centralized metadata database and web application for mass spectrometry proteomics experiments.

Helmsdeep captures, validates, and standardizes experimental metadata at every step of the proteomics workflow — from cell culture through mass spectrometry acquisition.

Public Snapshot Disclaimer

This repository is published as a static public snapshot. It is not actively maintained, and no pull requests, issues, feature requests, roadmap items, or release updates should be expected. Use it as a reference implementation, starting point, or foundation for your own adapted version.

The included Docker Compose setup is intended for local evaluation and development. It uses local demo credentials and has no built-in authentication; do not expose it to real users without adding your own authentication and deployment controls.

The platform is built around three ideas:

  1. Workflow-centric data model — database tables mirror actual laboratory procedures (cell culture → fractionation → peptide digest → MS run)
  2. Validation at entry — Pydantic schemas and relational constraints prevent errors from propagating downstream
  3. Adherence to FAIR principles — sample metadata in the database is intended to make proteomic mass spectrometry data Findable, Accessible, Interoperable, and Reusable.

Architecture

┌──────────────┐    HTTP/JSON    ┌──────────────────┐    SQLAlchemy   ┌─────────┐
│  Streamlit   │────────────────▶│   FastAPI (API)  │────────────────▶│  MySQL  │
│     UI       │◀────────────────│  helmsdeep-api   │                 │   DB    │
└──────────────┘                 └──────────────────┘                 └─────────┘
helmsdeep-ui                     helmsdeep-api-schemas
helmsdeep-client                 helmsdeep-db-models (SQLAlchemy ORM)

Packages in this repo:

Package Description
helmsdeep-db-models SQLAlchemy ORM models + Alembic migrations
helmsdeep-api-schemas Pydantic validation schemas (request/response)
helmsdeep-api FastAPI backend
helmsdeep-client Python client library used by the UI
helmsdeep-ui Streamlit web application

Quick Start with Docker

1. Clone and configure

git clone https://github.com/TalusBio/helmsdeep-open.git
cd helmsdeep-open
cp .env.example .env
# Edit .env if you want to change the default passwords

The Docker path does not require a local Python environment. For host-based development, uv sync uses the committed uv.lock dependency snapshot.

2. Start the stack and open the app

If you have docker installed, use the command below, otherwise skip to "Manual Start (without Docker)" section

sh scripts/docker-start.sh

This builds and starts three containers in the background:

  • mysql — MySQL 8.0 database on port 3306
  • api — FastAPI server on port 8000
  • ui — Streamlit app on port 8501

The startup flow runs automatically in this order:

  1. MySQL starts and passes its health check
  2. API container waits for MySQL, then:
    • Applies Alembic database migrations (alembic upgrade head)
    • Seeds prerequisite example data (scripts/seed_example_data.py) — idempotent, safe to re-run
    • Starts the FastAPI server and passes its health check
  3. UI container waits for the API, then starts Streamlit
  4. The script waits for Streamlit and opens http://localhost:8501
# Follow logs
docker compose logs -f

# Stop the stack
docker compose down

To start without opening a browser:

HELMSDEEP_OPEN_BROWSER=0 sh scripts/docker-start.sh

3. Try the example upload

An example Excel file is included at:

tests/data/input/Public helmsdeep metadata example.xlsx

All prerequisite database records (operators, cell lines, instruments, compounds, etc.) needed to validate this file are inserted automatically during startup. To try it:

  1. Open the Metadata page in the UI
  2. Upload tests/data/input/Public helmsdeep metadata example.xlsx
  3. The file should pass validation and register successfully

4. App URLs


Manual Start (without Docker)

Prerequisites

  • Python 3.12+
  • uv package manager
  • MySQL 8.0 (or use SQLite for quick testing)

Manual Setup

make install

On first run this creates .env from .env.example and exits — edit DATABASE_URL if needed, then re-run. Subsequently it installs packages, applies migrations, seeds example data, and starts both the API (:8000) and the UI (:8501). Press Ctrl+C to stop both.

Warning — make install resets the database schema on every run. It runs alembic downgrade base before alembic upgrade head, which drops and recreates all tables. Any existing data will be permanently deleted. To apply new migrations without destroying data, use uv run alembic upgrade head directly from packages/helmsdeep-db-models/.

SQLite (for quick prototyping)

export DATABASE_URL="sqlite:///$PWD/helmsdeep.db"
cd packages/helmsdeep-db-models
uv run alembic upgrade head
cd ../helmsdeep-api
uv run uvicorn main:app --port 8000

Data Model

The schema reflects the proteomics experimental workflow:

Operator ──────────────────────────────────────────────────────┐
Project / Grant / Program                                      │
                                                               │
Cell Type                                                      │
    └─ Cell Culture Registry ────────────────────────────────┐ │
           └─ Cell Fraction                                  │ │
                  └─ Peptide Digest                          │ │
                         └─ MS Run ◀── Instrument, Protocol ─┘ ┘
                                └─ Experiment (groups MS runs)

Key tables:

Table Description
operators Lab personnel
cell_culture_registry Frozen and active cell cultures
cell_fraction Sub-cellular fractions
peptide_digest Digestion step metadata
ms_run Mass spectrometry acquisition
wellplate Microplate tracking across workflow steps
experiment Groups of related MS runs
protocols Standard operating procedures
instruments Mass spectrometers and LC systems

Running Migrations

Migrations are managed with Alembic.

# From packages/helmsdeep-db-models/
# Apply all migrations
DATABASE_URL=... uv run alembic upgrade head

# Create a new migration
DATABASE_URL=... uv run alembic revision --autogenerate -m "description"

# Check current version
DATABASE_URL=... uv run alembic current

API Reference

The full OpenAPI spec is available at http://localhost:8000/docs when the server is running.

Key endpoint groups:

Prefix Description
/operators/ Lab personnel CRUD
/cell_cultures/ Cell culture registry
/cell_fractions/ Fractionation metadata
/peptide_digests/ Digest metadata
/mass_spectrometry/ MS run metadata
/experiments/ Experiment groups
/instruments/ Instruments
/protocols/ Protocols
/wellplates/ Microplate registry
/sdrf/ SDRF export

Environment Variables

Variable Required Description
DATABASE_URL Yes SQLAlchemy connection string
ENV No development / production / testing (default: development)
HELMSDEEP_DEV_USER No Username displayed in development mode (bypasses auth; local only)
DATABASE_RO_URL No Optional read-only database replica URL

Production Authentication

The Docker Compose setup has no authentication. For local development, HELMSDEEP_DEV_USER sets the active user without any login flow. In production you must place an authenticating reverse proxy in front of the UI that injects an X-Auth-Request-Email header with the logged-in user's email address. The original deployment used AWS Cognito via an Application Load Balancer, but any identity provider that can inject that header (oauth2-proxy, Nginx with auth_request, Cloudflare Access, etc.) will work. The injected email must match an operator name registered in the database.


Public Snapshot Policy

See CONTRIBUTING.md for the public snapshot policy and local development notes.


License

See LICENSE.

About

Open source code base for data models, API, and UI governing mass spectrometry proteomic based metadata

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages