radigest is a fast in-silico restriction digest and enzyme-pair screening toolkit for genomics.
It does two main things:
-
Digest known enzymes quickly. Use
radigestwhen you already know the enzyme or enzyme pair. -
Screen enzyme pairs for a design target. Use
radigest-designwhen you know the genome fraction, sample count, read budget, and depth target, but still need to choose enzymes.
The model is deterministic and sequence-level. It finds recognition sites in a reference FASTA, applies digest rules, applies optional size-selection weights, and writes reproducible outputs.
make build
make installThis installs:
radigest
radigest-design
radigest-fit-size-model
For development/helper commands:
make build-dev
make install-dev| Situation | Command |
|---|---|
| I already know my enzyme or enzyme pair | radigest |
| I want BED/GFF/TSV/FASTA fragment outputs | radigest |
| I want to screen enzyme pairs against a design target | radigest-design |
| I want to fit a size-selection model from observed inserts | radigest-fit-size-model |
Use radigest when the enzyme choice is already known.
radigest -fasta ref.fa -enzymes EcoRI,MseIWith no output flags, radigest writes a JSON run summary to stdout.
Save the summary:
radigest -fasta ref.fa -enzymes EcoRI,MseI \
-json run.jsonradigest \
-fasta ref.fa \
-enzymes EcoRI,MseI \
-min 300 \
-max 600 \
-json run.jsonradigest \
-fasta ref.fa \
-enzymes EcoRI,MseI \
-min 300 \
-max 600 \
-bed fragments.bed \
-gff fragments.gff3 \
-fragments-tsv fragments.tsv \
-json run.jsonCommon outputs:
| Output | Flag |
|---|---|
| JSON run summary | -json run.json |
| BED6 fragments | -bed fragments.bed |
| GFF3 fragments | -gff fragments.gff3 |
| Per-fragment TSV | -fragments-tsv fragments.tsv |
| Fragment sequences | -fragments-fasta fragments.fa |
Coordinates:
| Format | Coordinates |
|---|---|
| GFF3 | 1-based closed |
| BED | 0-based half-open |
| TSV | 0-based half-open |
| FASTA metadata | 0-based half-open |
By default, double-digest mode keeps adjacent AB/BA fragments:
radigest -fasta ref.fa -enzymes EcoRI,MseITo also keep AA/BB adjacent fragments:
radigest -fasta ref.fa -enzymes EcoRI,MseI -allow-sameTerminal contig-end fragments are omitted by default. Include them with:
radigest -fasta ref.fa -enzymes EcoRI,MseI -include-endsThe hard size window controls which fragments are retained:
-min 300 -max 600For weighted recovery modeling, score a broader range:
radigest \
-fasta ref.fa \
-enzymes PstI,MspI \
-min 300 \
-max 600 \
-score-min 1 \
-score-max 2000 \
-size-model normal \
-size-mean 275 \
-size-sd 85 \
-fragments-tsv fragments.tsv \
-json run.jsonSupported models:
hard
normal
triangular
soft-window
Use hard for a strict size window. Use the other models when size recovery is expected to be gradual rather than perfectly sharp.
Use radigest-design when the experimental target is known and the enzyme pair is the unknown.
Typical question:
Which enzyme pair best matches my target genome fraction and sequencing budget?
radigest-design \
--ref ref.fa \
--enzymes EcoRI,MseI,PstI,ApeKI,NlaIII,MspI \
--pct 2.5 \
--depth 10 \
--samples 96 \
--read-length 150 \
--flowcell-read-pairs 300M \
--usable-read-fraction 0.85Aliases:
| Alias | Full flag |
|---|---|
--ref |
--fasta |
--pct |
--target-genome-pct |
--depth |
--desired-depth |
cat > candidate_enzymes.txt <<'EOF'
EcoRI
MseI
PstI
MspI
ApeKI
NlaIII
MluCI
BfaI
EOFradigest-design \
--ref ref.fa \
--enzymes candidate_enzymes.txt \
--pct 2.5 \
--depth 10 \
--samples 96 \
--read-length 150 \
--flowcell-read-pairs 300M \
--usable-read-fraction 0.85 \
--out-dir radigest_designFor broad exploration:
radigest-design ... --enzymes allReview final enzyme choices manually before wet-lab use.
radigest-design \
--ref ref.fa \
--enzymes candidate_enzymes.txt \
--pct 2.5 \
--coverage-tolerance-pct 0.25 \
--depth 10 \
--samples 96 \
--read-layout pe \
--read-length 150 \
--flowcell-read-pairs 300M \
--usable-read-fraction 0.85 \
--min 300 \
--max 600 \
--score-min 1 \
--score-max 2000 \
--size-model normal \
--size-mean 275 \
--size-sd 85 \
--out-dir radigest_designradigest-design writes:
| File | Use |
|---|---|
design.summary.tsv |
Compact ranked table for human review |
design.tsv |
Full machine-readable ranked table |
design.json |
Full provenance and reproducibility record |
design.report.txt |
Simple key-value report |
It also prints a recommendation-first terminal summary:
Recommendation:
Recommended pair: PstI,MspI
Status: feasible
Why: predicted 2.43% genome vs target 2.50%; predicted 12.1x mean locus depth vs target 10x
Budget: 96 samples, 300M read pairs, 0.85 usable fraction
Main caution: mean insert 247 bp is below 2x150 bp; paired-end overlap likely
Files: radigest_design/design.summary.tsv, radigest_design/design.report.txt
Start with:
column -ts $'\t' radigest_design/design.summary.tsv | less -S| Term | Meaning |
|---|---|
--pct |
Target weighted recovered genome percentage |
--depth |
Mean read-pair depth per recovered locus |
weighted_fragments |
Modeled recovered loci competing for reads |
predicted_depth |
Read pairs per sample divided by weighted fragments |
usable_read_fraction |
Fraction of read pairs expected to remain useful after demultiplexing/QC/deduplication |
--depth is not basewise WGS depth. It is a mean read-pair depth per recovered locus.
Default:
--objective balancedOther options:
closest-coverage
depth-first
feasible-lowest-coverage
max-depth
Use balanced first. Rerun with another objective for sensitivity checks.
radigest \
-fasta ref.fa \
-enzymes PstI,MspI \
-min 300 \
-max 600 \
-bed fragments.bed \
-fragments-tsv fragments.tsv \
-json run.jsonradigest-design \
--ref ref.fa \
--enzymes candidate_enzymes.txt \
--pct 2.5 \
--depth 10 \
--samples 96 \
--read-length 150 \
--flowcell-read-pairs 300M \
--usable-read-fraction 0.85 \
--min 300 \
--max 600 \
--score-min 1 \
--score-max 2000 \
--size-model normal \
--size-mean 275 \
--size-sd 85 \
--out-dir radigest_designInspect the recommendation:
column -ts $'\t' radigest_design/design.summary.tsv | less -S
cat radigest_design/design.report.txtThen digest the selected pair:
radigest \
-fasta ref.fa \
-enzymes PstI,MspI \
-min 300 \
-max 600 \
-bed final_fragments.bed \
-fragments-tsv final_fragments.tsv \
-json final_digest.jsonradigest models:
- recognition sites
- cut coordinates
- single- and double-digest fragment rules
- hard size windows
- optional size-selection weights
- weighted recovered genome percentage
- mean read-pair depth per recovered locus
It does not model:
- methylation sensitivity
- partial digestion
- star activity
- enzyme efficiency
- buffer compatibility
- empirical digestion rates
- per-locus depth dispersion
Enzymes with the same recognition motif and cut coordinate are treated identically by the sequence-level model.
Use radigest for fast, reproducible in-silico screening. Validate final enzyme choices against wet-lab constraints.
radigest --help
radigest-design --help
radigest-fit-size-model --help
radigest -list-enzymesIf you use radigest, please cite the DOI listed at the top of this README.