Semi-production machine learning framework for genomic antimicrobial resistance (AMR) decision support.
The project provides a full pipeline from phenotype-linked isolate metadata to packaged models and API inference.
- End-to-end AMR pipeline: manifest build, download, feature extraction, train, evaluate, package.
- Two implemented clinical tasks:
ecoli_ciprofloxacinfor Escherichia colistaph_oxacillinfor Staphylococcus aureus
- FastAPI inference service with OpenAPI/Swagger docs at
/docs. - Containerized runtime via Docker for reproducibility and deployment parity.
- Structured quality controls and test suite (
pytest,mypy,ruff).
The system is organized in modular layers:
amr_pipeline/core: data ingestion, manifest normalization, FASTA QC, AMRFinder integration.amr_pipeline/modeling: train/select models, inference utilities, artifact packaging.apps/api: production-style HTTP API for inference and model readiness checks.configs: per-task reproducible configs.docs: architecture, data provenance, model cards, threat model, reproducibility notes.tests: API and pipeline tests.
POST /predict returns:
- organism and antibiotic context
- binary prediction (
ResistantorSusceptible) - calibrated
probability_resistant - detected AMR gene/mutation signals
- model/data version metadata
- inference mode (
fullorfallback) - assembly QC block and clinical caution message
make setup
make demomake api- Open
http://localhost:8000/docs - Or use:
curl -X POST "http://localhost:8000/predict?task=ecoli_ciprofloxacin" \
-F "fasta_file=@data/demo/sample_ecoli.fna"docker compose up --buildamr_pipeline build-manifest --organism ecoli --antibiotic ciprofloxacin --out data/processed/ecoli_manifest.csv --source-csv <ncbi_ast_export.csv>
amr_pipeline download --manifest data/processed/ecoli_manifest.csv --out data/raw
amr_pipeline featurize --manifest data/processed/ecoli_manifest.csv --fasta-dir data/raw --out data/processed/ecoli_features.parquet --config configs/ecoli_ciprofloxacin.yaml
amr_pipeline train --task ecoli_ciprofloxacin --features data/processed/ecoli_features.parquet --out models/ecoli_ciprofloxacin --config configs/ecoli_ciprofloxacin.yaml
amr_pipeline evaluate --task ecoli_ciprofloxacin --features data/processed/ecoli_features.parquet --model models/ecoli_ciprofloxacin/model.joblib --out models/ecoli_ciprofloxacin/evaluation.json
amr_pipeline package-model --task ecoli_ciprofloxacin --config configs/ecoli_ciprofloxacin.yaml --model-dir models/ecoli_ciprofloxacin --manifest data/processed/ecoli_manifest.csv- Type and style checks:
make lint - Unit tests with coverage:
make test - CLI/API smoke checks:
make smoke
- This repository is for clinical decision support research/prototyping.
- It is not a regulated diagnostic device.
- Predictions must be confirmed with laboratory phenotypic AST and local policy.
- Architecture:
docs/architecture.md - Data provenance:
docs/data_provenance.md - API usage:
docs/usage_api.md - Threat model:
docs/threat_model.md - Model cards:
docs/model_card_ecoli_cipro.md,docs/model_card_staph_oxacillin.md