Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## v2.0.2

- Drop unsed params
- Drop unused params
- Set aligner to `star` for RNA-seq
- Finetune resources
- Fix some bugs for different input tags
Expand Down
51 changes: 18 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,23 @@ It also performs basic QC and coverage analysis.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Steps inlcude:
Steps include:

1. Demultiplexing using [`BCLconvert`](https://emea.support.illumina.com/sequencing/sequencing_software/bcl-convert.html)
2. Read QC and trimming using [`fastp`](https://github.com/OpenGene/fastp)
3. Alignment using either [`bwa`](https://github.com/lh3/bwa), [`bwa-mem2`](https://github.com/bwa-mem2/bwa-mem2), [`bowtie2`](https://github.com/BenLangmead/bowtie2), [`dragmap`](https://github.com/Illumina/DRAGMAP), [`snap`](https://github.com/amplab/snap) or [`strobe`](https://github.com/ksahlin/strobealign) for DNA-seq and [`STAR`](https://github.com/alexdobin/STAR) for RNA-seq
4. Duplicate marking using [`bamsormadup`](https://gitlab.com/german.tischler/biobambam2) or [`samtools markdup`](http://www.htslib.org/doc/samtools-markdup.html)
5. Coverage analysis using [`mosdepth`](https://github.com/brentp/mosdepth) and [`samtools coverage`](http://www.htslib.org/doc/samtools-coverage.html)
6. Alignment QC using [`samtools flagstat`](http://www.htslib.org/doc/samtools-flagstat.html), [`samtools stats`](http://www.htslib.org/doc/samtools-stats.html), [`samtools idxstats`](http://www.htslib.org/doc/samtools-idxstats.html) and [`picard CollecHsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectHsMetrics), [`picard CollectWgsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectWgsMetrics), [`picard CollectMultipleMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics)
7. QC aggregation using [`multiqc`](https://multiqc.info/)
- Demultiplexing using [`BCLconvert`](https://emea.support.illumina.com/sequencing/sequencing_software/bcl-convert.html)
- Run QC using [`MultiQC SAV`](https://github.com/MultiQC/MultiQC_SAV)
- Read QC and trimming using [`fastp`](https://github.com/OpenGene/fastp) or [`falco`](https://github.com/smithlabcode/falco)
- Alignment using either [`bwa`](https://github.com/lh3/bwa), [`bwa-mem2`](https://github.com/bwa-mem2/bwa-mem2), [`bowtie2`](https://github.com/BenLangmead/bowtie2), [`dragmap`](https://github.com/Illumina/DRAGMAP), [`snap`](https://github.com/amplab/snap) or [`strobe`](https://github.com/ksahlin/strobealign) for DNA-seq and [`STAR`](https://github.com/alexdobin/STAR) for RNA-seq
- Duplicate marking using [`bamsormadup`](https://gitlab.com/german.tischler/biobambam2) or [`samtools markdup`](http://www.htslib.org/doc/samtools-markdup.html)
- Coverage analysis using [`mosdepth`](https://github.com/brentp/mosdepth) and [`samtools coverage`](http://www.htslib.org/doc/samtools-coverage.html)
- Alignment QC using [`samtools flagstat`](http://www.htslib.org/doc/samtools-flagstat.html), [`samtools stats`](http://www.htslib.org/doc/samtools-stats.html), [`samtools idxstats`](http://www.htslib.org/doc/samtools-idxstats.html) and [`picard CollectHsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectHsMetrics), [`picard CollectWgsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectWgsMetrics), [`picard CollectMultipleMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics)
- QC aggregation using [`multiqc`](https://multiqc.info/)

![metro map](docs/images/metro_map.png)
<picture>

<source media="(prefers-color-scheme: dark)" srcset="docs/images/metro_map_dark.svg">
<source media="(prefers-color-scheme: light)" srcset="docs/images/metro_map_light.svg">
<img alt="Fallback image description" src="docs/images/metro_map_light.svg">
</picture>

## Usage

Expand All @@ -37,36 +43,15 @@ Steps inlcude:

The full documentation can be found [here](docs/README.md)

First, prepare a samplesheet with your input data that looks as follows:

`samplesheet.csv` for fastq inputs:

```csv
id,samplename,organism,library,aligner,fastq_1,fastq_2
sample1,sample1,Homo sapiens,Library_Name,bwamem,reads1.fq.gz,reads2.fq.gz
```

`samplesheet.csv` for flowcell inputs:

```csv
id,samplesheet,lane,flowcell,sample_info
flowcell_id,/path/to/illumina_samplesheet.csv,1,/path/to/sequencer_uploaddir,/path/to/sampleinfo.csv
```

`sampleinfo.csv` for use with flowcell inputs:

```csv
samplename,library,organism,tag,aligner
fc_sample1,test,Homo sapiens,WES,bwamem
```
First, prepare a samplesheet with your input data. Check the [usage docs](docs/usage.md) for details on the required format and example files.

Now, you can run the pipeline using:

```bash
nextflow run nf-cmgg/preprocessing \
-profile <docker/singularity/.../institute> \
-profile <docker/singularity/...> \
--igenomes_base /path/to/genomes \
--input samplesheet.csv \
--input samplesheet.<csv|yaml|json> \
--outdir <OUTDIR>
```

Expand Down
4 changes: 2 additions & 2 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -160,10 +160,10 @@
},
"anyOf": [
{
"required": ["id", "samplename", "organism", "aligner", "tag", "fastq_1", "fastq_2"]
"required": ["id", "samplename", "organism", "aligner", "fastq_1", "fastq_2"]
},
{
"required": ["id", "samplename", "genome", "aligner", "tag", "fastq_1", "fastq_2"]
"required": ["id", "samplename", "genome", "aligner", "fastq_1", "fastq_2"]
},
{
"required": ["id", "samplesheet", "sample_info", "flowcell"]
Expand Down
Binary file removed docs/images/metro_map.png
Binary file not shown.
Loading