Bygul: Amplicon & Metagenomics Read Simulator

Bygul is a Python 3 tool designed for simulating sequencing reads in wastewater surveillance and other metagenomic applications. It allows users to simulate complex multi-sample datasets with customizable proportions using industry-standard backends like wgsim and mason.

🏗 Installation

Bygul requires Python 3. Since it relies on external simulators (wgsim and mason), we recommend using Conda to manage dependencies.For more info on wgsim and mason simulator please check their documentations.

Option 1: Via Conda (Recommended)

conda create -n bygul bioconda::bygul

Option 2: Via PyPI

pip install bygul

Note: Some binary dependencies (wgsim/mason) may need to be installed manually or built from source if using this method.

Option 3: Local Build from Source

git clone [https://github.com/andersen-lab/Bygul](https://github.com/andersen-lab/Bygul)
cd Bygul
pip install -e .

🧬 Usage: Amplicon Sequencing Mode

Use this mode when simulating specific genomic regions defined by a primer set.

Basic Command

bygul simulate-proportions [SAMPLE1.fasta,SAMPLE2.fasta] --primers [primer.bed] --reference [reference.fasta] --proportions [0.8,0.2] --outdir [output_dir]

Advanced Examples

Random Proportions & Mismatches: Simulate with random proportions and allow up to 2 SNPs in primer regions.

bygul simulate-proportions sample1.fasta,sample2.fasta --primers primer.bed --reference reference.fasta --outdir results/ --maxmismatch 2

Switching Simulators: Use mason instead of the default wgsim.

bygul simulate-proportions sample1.fasta,sample2.fasta --primers primer.bed --reference reference.fasta --simulator mason

Custom Error Rates & Lengths: Pass simulator-specific parameters (e.g. indel fraction -R) directly.

bygul simulate-proportions sample1.fasta,sample2.fasta --primers primer.bed --reference reference.fasta -R 0.01

🌍 Usage: Metagenomics Mode

Simulate reads from entire samples without requiring a primer BED file or a reference sequence.

Basic Metagenomics Simulation

bygul simulate-proportions sample1.fasta,sample2.fasta --outdir results/ --simulation_mode metagenomics

Metagenomics with Specific Parameters

bygul simulate-proportions sample1.fasta,sample2.fasta --proportions 0.5,0.5 --outdir results/ --simulation_mode metagenomics --simulator mason --illumina-read-length 200

📝 Technical Notes

Parameter Handling

Bygul acts as a wrapper. While most flags are passed directly to the underlying simulators, the following are managed directly by Bygul for more realistic simulations(amplicon simulation mode only):

--readcnt: Number of reads per amplicon.
--wgsim_insert_size: Insert size for wgsim.
--wgsim_read_length / --wgsim_error_rate.

To see all available backend flags, run:

wgsim --help
mason_simulator --help

Best Practices

Read Counts: Set --readcnt higher than the number of contigs in your amplicon file. Too few reads can result in empty files for certain amplicons.
Primer Files: The BED file must include a column with the primer sequence. Bygul allows 1 SNP mismatch by default; use --maxmismatch to change this.

Output Files

Consolidated Reads: Simulated reads from all samples are at outdir/reads.fastq.
Proportions: Assigned proportions are recorded in results/sample_proportions.txt.
Quality Metrics: Check outdir/[sample_name]/amplicon_stats.csv for information on amplicon dropouts, mismatches, and ambiguous bases.

🎓 Citation

If you use this workflow in a paper, please cite the original repository: https://github.com/andersen-lab/Bygul

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github/workflows		.github/workflows
bygul		bygul
ci		ci
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bygul: Amplicon & Metagenomics Read Simulator

🏗 Installation

Option 1: Via Conda (Recommended)

Option 2: Via PyPI

Option 3: Local Build from Source

🧬 Usage: Amplicon Sequencing Mode

Basic Command

Advanced Examples

🌍 Usage: Metagenomics Mode

Basic Metagenomics Simulation

Metagenomics with Specific Parameters

📝 Technical Notes

Parameter Handling

Best Practices

Output Files

🎓 Citation

About

Uh oh!

Releases 10

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Bygul: Amplicon & Metagenomics Read Simulator

🏗 Installation

Option 1: Via Conda (Recommended)

Option 2: Via PyPI

Option 3: Local Build from Source

🧬 Usage: Amplicon Sequencing Mode

Basic Command

Advanced Examples

🌍 Usage: Metagenomics Mode

Basic Metagenomics Simulation

Metagenomics with Specific Parameters

📝 Technical Notes

Parameter Handling

Best Practices

Output Files

🎓 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages