Skip to content

Dinesh431786/Crispr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

178 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 CRISPR Precision Studio

Design, rank, explain, and validate CRISPR guides β€” in one lightweight platform.

Transparent guide prioritization: interpretable on-target scoring, both-strand off-target analysis, and prime-editing support β€” without GPUs, cloud dependencies, or black-box predictions.

CI Python FastAPI Dependencies No API keys License

CI runs the test suite and renders a live UI screenshot (downloadable as a build artifact) on every push.


πŸ“Έ The interface

CRISPR Precision Studio UI

Rendered automatically by CI on every push β€” one Score per guide, with a per-feature Details breakdown.


✨ Highlights

Feature What it means
🎯 One Score A single 0–100 prioritization number ranks each guide β€” no column soup.
πŸ” Explainable POST /api/explain shows the per-feature breakdown behind every score.
🧬 Both-strand off-targets Vectorised NumPy scan + per-site CFD & MIT/Hsu + aggregate specificity.
🌟 Prime Editing Studio PRIDICT2.0-informed pegRNA design (Spacer + RTT + PBS).
πŸ“š Peer-reviewed scoring CRISPRscan weights reproduced verbatim & unit-validated β€” zero downloads.
πŸ”Œ Pluggable models onnx β†’ trained-linear β†’ heuristic, auto-selected and reported.
⚑ Lightweight No GPU, no LLM keys, no DB β€” everything computed per request.

πŸš€ Quickstart

cd crispr_app
pip install -r requirements.txt
uvicorn main:app --reload

➑️ Open http://127.0.0.1:8000


πŸ”„ Workflow

   DNA sequence (paste or FASTA)
            β”‚
            β–Ό
   Guide discovery  ── both strands, multi-PAM (NGG/NAG/NG/TTTV)
            β”‚
            β–Ό
   On-target scoring ── built-in model + CRISPRscan
            β”‚
            β–Ό
   Off-target analysis ── CFD + MIT/Hsu + aggregate specificity
            β”‚
            β–Ό
   Ranking ── one 0–100 Score per guide
            β”‚
            β–Ό
   Explanation ── per-feature breakdown (/api/explain)

⚑ Example

curl -s -X POST http://127.0.0.1:8000/api/design \
  -H 'Content-Type: application/json' \
  -d '{"dna_sequence": "ATGGCCGAGTACAAGCCCACGGTGCGCCTCGCC...", "pam": "NGG"}'

Real output (288 bp input β†’ 21 guides found; model: linear, the shipped default):

# Guide (5β€²β†’3β€²) PAM Strand GC% Score
1 GATGTGGCGGTCCGGATCGA CGG βˆ’ 65 74
2 AAGGTGTGGGTCGCGGACGA CGG + 65 74
3 ATCGACGGTGTGGCGCGTGG CGG βˆ’ 70 69

POST /api/explain then returns the per-feature breakdown (GC, Tm, position-specific contributions) behind any guide's Score.


🎯 The score (production)

Each guide gets one Score, 0–100 (higher = better) β€” a relative prioritization score combining the on-target predictors, not a literal % editing rate. Color-coded:

🟒 High 🟑 Moderate πŸ”΄ Low
β‰₯ 60 40 – 59 < 40

Click Details on any guide to see why it scored that way (GC, Tm, position-specific features…). Component sub-scores stay in the API/CSV for power users β€” never on screen.


πŸ“Š Accuracy β€” measured, not asserted

Held-out Spearman ρ on real public datasets (full table + method in BENCHMARKS.md). The platform offers two tiers β€” a transparent built-in ranker (default) and an optional external deep-learning backend for maximum raw accuracy:

πŸͺΆ Built-in β€” lightweight, interpretable, zero setup

Model ρ Notes
Shipped trained (default) 0.22 – 0.41 pooled human SpCas9, leave-one-dataset-out
Trained on your own data 0.40 – 0.52 one command β€” train.py
Heuristic (always available) ~0.25 fully interpretable fallback
CRISPRscan (peer-reviewed, validated) 0.58 on its home dataset

🧠 Optional β€” external deep-learning backend

Model ρ Notes
ONNX (DeepSpCas9 / CRISPRon) ~0.85 bring your own export; auto-detected

The built-in tier optimises for transparency and speed β€” its job is to rank candidates well enough to prioritise, with every score explainable. For maximum raw correlation, drop in a deep model via ONNX.

⚠️ Honesty note. No predictor can exceed the ~0.71–0.77 reproducibility ceiling of the wet-lab data itself; published state-of-the-art tops out around ~0.85–0.88. Our scores are deterministic surrogates for ranking; wet-lab validation remains essential.

Train a stronger model (NumPy-only, no heavy ML stack)

cd crispr_app
python train.py dataset.csv          # columns: guide,measured[,ngg_context]
# β†’ writes models/linear.json; the API auto-loads it and reports model="linear"

python benchmark.py data.context.tab # measure Spearman on a CRISPOR-format set

Position-specific dinucleotide features roughly double Spearman on datasets with signal (chari2015 0.20β†’0.40, morenoMateos 0.17β†’0.43); gradient boosting matched ridge to Β±0.02, so we stay dependency-free.


πŸ†š How it compares

Honest positioning β€” including where we're weaker. CRISPOR/CHOPCHOP are mature, genome-aware tools; our edge is transparent prioritization in a lightweight, API-first package.

Capability CRISPR Precision Studio CRISPOR CHOPCHOP Benchling
Single explainable prioritization score βœ“ partialΒΉ partialΒΉ βœ—
Per-feature score breakdown (API) βœ“ βœ— βœ— βœ—
Both-strand off-target (CFD + MIT) βœ“ βœ“ (reference) βœ“ βœ“
Genome-wide off-target search βœ— (background seq only) βœ“ βœ“ βœ“
Prime-editing pegRNA design βœ“ βœ—Β² partial βœ—
JSON API-first βœ“ partial βœ— βœ“
Runs locally, no GPU / no keys βœ“ βœ“Β³ βœ“Β³ βœ— (SaaS)

ΒΉ Report several separate scores rather than one explained number. Β² CRISPOR targets Cas9/Cas12a guide design; pegRNA design is usually a separate tool (PrimeDesign / pegFinder). Β³ Open-source but heavier to self-host. Marks reflect typical usage and may change as those tools evolve.

Honest gap: genome-wide off-target scanning is the main capability CRISPOR/CHOPCHOP have that we don't β€” it's on the roadmap.


πŸ—οΈ Architecture

Browser (templates/index.html + static/app.js)
        β”‚  JSON over fetch()
        β–Ό
FastAPI (main.py)  ──►  Pydantic validation + utils.validate_sequence
        β”‚
        β–Ό
Science layer
   β”œβ”€β”€ scoring.py     on-target efficiency   (Doench RS2 / Azimuth-informed)
   β”œβ”€β”€ crisprscan.py  CRISPRscan             (Moreno-Mateos 2015, verbatim)
   β”œβ”€β”€ offtarget.py   CFD + MIT/Hsu + aggregate specificity
   β”œβ”€β”€ prime.py       pegRNA design          (PRIDICT2.0-informed)
   β”œβ”€β”€ features.py / models.py / train.py    pluggable + trainable models
   └── analysis.py    pipeline + vectorised both-strand off-target search
        β”‚  pandas DataFrame β†’ JSON
        β–Ό
Browser renders one ranked table

πŸ”Œ API reference

Method & route Purpose
GET /health liveness check
POST /api/design ranked gRNAs with the ConsensusScore (the 0–100 Score)
POST /api/offtargets per-site CFD/MIT hits + per-guide specificity summary
POST /api/simulate protein / indel outcome of an edit
POST /api/prime-design ranked pegRNAs (Spacer + RTT + PBS)
POST /api/explain interpretable per-feature score breakdown
POST /api/upload-fasta parse pasted FASTA / plain DNA
GET /api/models active & available on-target backends

🌟 Prime editing β€” how pegRNAs are chosen

For a target base substitution, prime.py enumerates and ranks candidate pegRNAs using determinants from PRIDICT2.0 (Mathis 2024) and Anzalone 2019:

  1. Spacer / nick. Scan NGG PAMs within ~30 nt of the target; place the Cas9 nick 3 bp 5β€² of each PAM. Require the edit to fall 0–15 nt downstream of the nick.
  2. PBS (primer-binding site). Enumerate lengths 8–17 nt; the PBS is the reverse complement of the sequence immediately 5β€² of the nick. Its nearest-neighbour Tm is optimised toward ~37 Β°C (Gaussian reward), with mild length penalties favouring ~13 nt.
  3. RTT (reverse-transcriptase template). Enumerate lengths 10–20 nt; the RTT encodes the edit and must retain β‰₯3 nt of 3β€² homology past the edit for flap resolution. Penalties: RTT that begins with C (destabilises the edited flap) and RTT GC far from ~55%. Length term favours ~12 nt.
  4. Ranking. A calibrated logistic score blends the PBS Tm, PBS/RTT length terms, 3β€²-homology constraint, RTT-starts-with-C penalty, and GC term into one 0–1 Score.

The pegRNA score is PRIDICT2.0-informed, not the trained PRIDICT2.0 network. It reproduces the published determinants for ranking; it has not yet been numerically benchmarked against a PRIDICT test set (on the roadmap). No secondary-structure (e.g. RNAfold) penalty is applied yet.


πŸ”¬ Scientific basis

Component Model / source
On-target Doench 2014/2016 Rule Set 2/Azimuth (Nat. Biotechnol. 34:184); CRISPRscan (Moreno-Mateos, Nat. Methods 2015)
Off-target (site) CFD (Doench 2016) Β· MIT/Hsu (Hsu 2013, Nat. Biotechnol. 31:827)
Off-target (guide) aggregate specificity 10000 / (100 + Ξ£ scores) (CRISPOR convention)
Prime editing PRIDICT2.0 (Mathis 2024, doi:10.1038/s41587-024-02268-2); Anzalone 2019 (Nature 576:149)

βœ… Tests

pip install pytest
python -m pytest tests/ -q     # 35 passing

Covers on-target scoring, CFD/MIT scoring, aggregate specificity, both-strand off-target detection, pegRNA design, the model registry & trainer, CRISPRscan reference-vector validation, performance, and dependency hygiene.


MIT licensed Β· No API keys required Β· Wet-lab validation always essential

Releases

No releases published

Packages

 
 
 

Contributors