Skip to content

ericksamera/radigest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

radigest

CI DOI Go License: MIT

radigest is a fast in-silico restriction digest and enzyme-pair screening toolkit for genomics.

It does two main things:

  1. Digest known enzymes quickly. Use radigest when you already know the enzyme or enzyme pair.

  2. Screen enzyme pairs for a design target. Use radigest-design when you know the genome fraction, sample count, read budget, and depth target, but still need to choose enzymes.

The model is deterministic and sequence-level. It finds recognition sites in a reference FASTA, applies digest rules, applies optional size-selection weights, and writes reproducible outputs.


Install

make build
make install

This installs:

radigest
radigest-design
radigest-fit-size-model

For development/helper commands:

make build-dev
make install-dev

Which command should I use?

Situation Command
I already know my enzyme or enzyme pair radigest
I want BED/GFF/TSV/FASTA fragment outputs radigest
I want to screen enzyme pairs against a design target radigest-design
I want to fit a size-selection model from observed inserts radigest-fit-size-model

1. Fast digestion with radigest

Use radigest when the enzyme choice is already known.

Minimal digest

radigest -fasta ref.fa -enzymes EcoRI,MseI

With no output flags, radigest writes a JSON run summary to stdout.

Save the summary:

radigest -fasta ref.fa -enzymes EcoRI,MseI \
  -json run.json

Size-select fragments

radigest \
  -fasta ref.fa \
  -enzymes EcoRI,MseI \
  -min 300 \
  -max 600 \
  -json run.json

Write fragment files

radigest \
  -fasta ref.fa \
  -enzymes EcoRI,MseI \
  -min 300 \
  -max 600 \
  -bed fragments.bed \
  -gff fragments.gff3 \
  -fragments-tsv fragments.tsv \
  -json run.json

Common outputs:

Output Flag
JSON run summary -json run.json
BED6 fragments -bed fragments.bed
GFF3 fragments -gff fragments.gff3
Per-fragment TSV -fragments-tsv fragments.tsv
Fragment sequences -fragments-fasta fragments.fa

Coordinates:

Format Coordinates
GFF3 1-based closed
BED 0-based half-open
TSV 0-based half-open
FASTA metadata 0-based half-open

Double-digest behavior

By default, double-digest mode keeps adjacent AB/BA fragments:

radigest -fasta ref.fa -enzymes EcoRI,MseI

To also keep AA/BB adjacent fragments:

radigest -fasta ref.fa -enzymes EcoRI,MseI -allow-same

Terminal contig-end fragments are omitted by default. Include them with:

radigest -fasta ref.fa -enzymes EcoRI,MseI -include-ends

Size-selection models

The hard size window controls which fragments are retained:

-min 300 -max 600

For weighted recovery modeling, score a broader range:

radigest \
  -fasta ref.fa \
  -enzymes PstI,MspI \
  -min 300 \
  -max 600 \
  -score-min 1 \
  -score-max 2000 \
  -size-model normal \
  -size-mean 275 \
  -size-sd 85 \
  -fragments-tsv fragments.tsv \
  -json run.json

Supported models:

hard
normal
triangular
soft-window

Use hard for a strict size window. Use the other models when size recovery is expected to be gradual rather than perfectly sharp.


2. Enzyme-pair screening with radigest-design

Use radigest-design when the experimental target is known and the enzyme pair is the unknown.

Typical question:

Which enzyme pair best matches my target genome fraction and sequencing budget?

Minimal design run

radigest-design \
  --ref ref.fa \
  --enzymes EcoRI,MseI,PstI,ApeKI,NlaIII,MspI \
  --pct 2.5 \
  --depth 10 \
  --samples 96 \
  --read-length 150 \
  --flowcell-read-pairs 300M \
  --usable-read-fraction 0.85

Aliases:

Alias Full flag
--ref --fasta
--pct --target-genome-pct
--depth --desired-depth

Use an enzyme list

cat > candidate_enzymes.txt <<'EOF'
EcoRI
MseI
PstI
MspI
ApeKI
NlaIII
MluCI
BfaI
EOF
radigest-design \
  --ref ref.fa \
  --enzymes candidate_enzymes.txt \
  --pct 2.5 \
  --depth 10 \
  --samples 96 \
  --read-length 150 \
  --flowcell-read-pairs 300M \
  --usable-read-fraction 0.85 \
  --out-dir radigest_design

For broad exploration:

radigest-design ... --enzymes all

Review final enzyme choices manually before wet-lab use.

Add size-selection assumptions

radigest-design \
  --ref ref.fa \
  --enzymes candidate_enzymes.txt \
  --pct 2.5 \
  --coverage-tolerance-pct 0.25 \
  --depth 10 \
  --samples 96 \
  --read-layout pe \
  --read-length 150 \
  --flowcell-read-pairs 300M \
  --usable-read-fraction 0.85 \
  --min 300 \
  --max 600 \
  --score-min 1 \
  --score-max 2000 \
  --size-model normal \
  --size-mean 275 \
  --size-sd 85 \
  --out-dir radigest_design

What radigest-design reports

radigest-design writes:

File Use
design.summary.tsv Compact ranked table for human review
design.tsv Full machine-readable ranked table
design.json Full provenance and reproducibility record
design.report.txt Simple key-value report

It also prints a recommendation-first terminal summary:

Recommendation:
Recommended pair: PstI,MspI
Status: feasible
Why: predicted 2.43% genome vs target 2.50%; predicted 12.1x mean locus depth vs target 10x
Budget: 96 samples, 300M read pairs, 0.85 usable fraction
Main caution: mean insert 247 bp is below 2x150 bp; paired-end overlap likely
Files: radigest_design/design.summary.tsv, radigest_design/design.report.txt

Start with:

column -ts $'\t' radigest_design/design.summary.tsv | less -S

Key design terms

Term Meaning
--pct Target weighted recovered genome percentage
--depth Mean read-pair depth per recovered locus
weighted_fragments Modeled recovered loci competing for reads
predicted_depth Read pairs per sample divided by weighted fragments
usable_read_fraction Fraction of read pairs expected to remain useful after demultiplexing/QC/deduplication

--depth is not basewise WGS depth. It is a mean read-pair depth per recovered locus.

Ranking objectives

Default:

--objective balanced

Other options:

closest-coverage
depth-first
feasible-lowest-coverage
max-depth

Use balanced first. Rerun with another objective for sensitivity checks.


Practical analysis workflow

A. Digest a known pair

radigest \
  -fasta ref.fa \
  -enzymes PstI,MspI \
  -min 300 \
  -max 600 \
  -bed fragments.bed \
  -fragments-tsv fragments.tsv \
  -json run.json

B. Choose a pair, then digest it

radigest-design \
  --ref ref.fa \
  --enzymes candidate_enzymes.txt \
  --pct 2.5 \
  --depth 10 \
  --samples 96 \
  --read-length 150 \
  --flowcell-read-pairs 300M \
  --usable-read-fraction 0.85 \
  --min 300 \
  --max 600 \
  --score-min 1 \
  --score-max 2000 \
  --size-model normal \
  --size-mean 275 \
  --size-sd 85 \
  --out-dir radigest_design

Inspect the recommendation:

column -ts $'\t' radigest_design/design.summary.tsv | less -S
cat radigest_design/design.report.txt

Then digest the selected pair:

radigest \
  -fasta ref.fa \
  -enzymes PstI,MspI \
  -min 300 \
  -max 600 \
  -bed final_fragments.bed \
  -fragments-tsv final_fragments.tsv \
  -json final_digest.json

Model scope

radigest models:

  • recognition sites
  • cut coordinates
  • single- and double-digest fragment rules
  • hard size windows
  • optional size-selection weights
  • weighted recovered genome percentage
  • mean read-pair depth per recovered locus

It does not model:

  • methylation sensitivity
  • partial digestion
  • star activity
  • enzyme efficiency
  • buffer compatibility
  • empirical digestion rates
  • per-locus depth dispersion

Enzymes with the same recognition motif and cut coordinate are treated identically by the sequence-level model.

Use radigest for fast, reproducible in-silico screening. Validate final enzyme choices against wet-lab constraints.


Help

radigest --help
radigest-design --help
radigest-fit-size-model --help
radigest -list-enzymes

Citation

If you use radigest, please cite the DOI listed at the top of this README.

About

Go-based genomics tool for FASTA restriction digest simulation, size-selected fragment export, and GBS/ddRAD workflow support.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors