diff --git a/CHANGELOG.md b/CHANGELOG.md
index c062310d..34cc5393 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -50,7 +50,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## v2.0.2
-- Drop unsed params
+- Drop unused params
- Set aligner to `star` for RNA-seq
- Finetune resources
- Fix some bugs for different input tags
diff --git a/README.md b/README.md
index 119479bf..2d6849f9 100644
--- a/README.md
+++ b/README.md
@@ -18,17 +18,23 @@ It also performs basic QC and coverage analysis.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
-Steps inlcude:
+Steps include:
-1. Demultiplexing using [`BCLconvert`](https://emea.support.illumina.com/sequencing/sequencing_software/bcl-convert.html)
-2. Read QC and trimming using [`fastp`](https://github.com/OpenGene/fastp)
-3. Alignment using either [`bwa`](https://github.com/lh3/bwa), [`bwa-mem2`](https://github.com/bwa-mem2/bwa-mem2), [`bowtie2`](https://github.com/BenLangmead/bowtie2), [`dragmap`](https://github.com/Illumina/DRAGMAP), [`snap`](https://github.com/amplab/snap) or [`strobe`](https://github.com/ksahlin/strobealign) for DNA-seq and [`STAR`](https://github.com/alexdobin/STAR) for RNA-seq
-4. Duplicate marking using [`bamsormadup`](https://gitlab.com/german.tischler/biobambam2) or [`samtools markdup`](http://www.htslib.org/doc/samtools-markdup.html)
-5. Coverage analysis using [`mosdepth`](https://github.com/brentp/mosdepth) and [`samtools coverage`](http://www.htslib.org/doc/samtools-coverage.html)
-6. Alignment QC using [`samtools flagstat`](http://www.htslib.org/doc/samtools-flagstat.html), [`samtools stats`](http://www.htslib.org/doc/samtools-stats.html), [`samtools idxstats`](http://www.htslib.org/doc/samtools-idxstats.html) and [`picard CollecHsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectHsMetrics), [`picard CollectWgsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectWgsMetrics), [`picard CollectMultipleMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics)
-7. QC aggregation using [`multiqc`](https://multiqc.info/)
+- Demultiplexing using [`BCLconvert`](https://emea.support.illumina.com/sequencing/sequencing_software/bcl-convert.html)
+- Run QC using [`MultiQC SAV`](https://github.com/MultiQC/MultiQC_SAV)
+- Read QC and trimming using [`fastp`](https://github.com/OpenGene/fastp) or [`falco`](https://github.com/smithlabcode/falco)
+- Alignment using either [`bwa`](https://github.com/lh3/bwa), [`bwa-mem2`](https://github.com/bwa-mem2/bwa-mem2), [`bowtie2`](https://github.com/BenLangmead/bowtie2), [`dragmap`](https://github.com/Illumina/DRAGMAP), [`snap`](https://github.com/amplab/snap) or [`strobe`](https://github.com/ksahlin/strobealign) for DNA-seq and [`STAR`](https://github.com/alexdobin/STAR) for RNA-seq
+- Duplicate marking using [`bamsormadup`](https://gitlab.com/german.tischler/biobambam2) or [`samtools markdup`](http://www.htslib.org/doc/samtools-markdup.html)
+- Coverage analysis using [`mosdepth`](https://github.com/brentp/mosdepth) and [`samtools coverage`](http://www.htslib.org/doc/samtools-coverage.html)
+- Alignment QC using [`samtools flagstat`](http://www.htslib.org/doc/samtools-flagstat.html), [`samtools stats`](http://www.htslib.org/doc/samtools-stats.html), [`samtools idxstats`](http://www.htslib.org/doc/samtools-idxstats.html) and [`picard CollectHsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectHsMetrics), [`picard CollectWgsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectWgsMetrics), [`picard CollectMultipleMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics)
+- QC aggregation using [`multiqc`](https://multiqc.info/)
-
+
+
+
+
+
+
## Usage
@@ -37,36 +43,15 @@ Steps inlcude:
The full documentation can be found [here](docs/README.md)
-First, prepare a samplesheet with your input data that looks as follows:
-
-`samplesheet.csv` for fastq inputs:
-
-```csv
-id,samplename,organism,library,aligner,fastq_1,fastq_2
-sample1,sample1,Homo sapiens,Library_Name,bwamem,reads1.fq.gz,reads2.fq.gz
-```
-
-`samplesheet.csv` for flowcell inputs:
-
-```csv
-id,samplesheet,lane,flowcell,sample_info
-flowcell_id,/path/to/illumina_samplesheet.csv,1,/path/to/sequencer_uploaddir,/path/to/sampleinfo.csv
-```
-
-`sampleinfo.csv` for use with flowcell inputs:
-
-```csv
-samplename,library,organism,tag,aligner
-fc_sample1,test,Homo sapiens,WES,bwamem
-```
+First, prepare a samplesheet with your input data. Check the [usage docs](docs/usage.md) for details on the required format and example files.
Now, you can run the pipeline using:
```bash
nextflow run nf-cmgg/preprocessing \
- -profile \
+ -profile \
--igenomes_base /path/to/genomes \
- --input samplesheet.csv \
+ --input samplesheet. \
--outdir
```
diff --git a/assets/schema_input.json b/assets/schema_input.json
index a6cd4941..4a7bd527 100644
--- a/assets/schema_input.json
+++ b/assets/schema_input.json
@@ -160,10 +160,10 @@
},
"anyOf": [
{
- "required": ["id", "samplename", "organism", "aligner", "tag", "fastq_1", "fastq_2"]
+ "required": ["id", "samplename", "organism", "aligner", "fastq_1", "fastq_2"]
},
{
- "required": ["id", "samplename", "genome", "aligner", "tag", "fastq_1", "fastq_2"]
+ "required": ["id", "samplename", "genome", "aligner", "fastq_1", "fastq_2"]
},
{
"required": ["id", "samplesheet", "sample_info", "flowcell"]
diff --git a/docs/images/metro_map.png b/docs/images/metro_map.png
deleted file mode 100644
index f2057abd..00000000
Binary files a/docs/images/metro_map.png and /dev/null differ
diff --git a/docs/images/metro_map.svg b/docs/images/metro_map.svg
deleted file mode 100644
index cfd9b083..00000000
--- a/docs/images/metro_map.svg
+++ /dev/null
@@ -1,1263 +0,0 @@
-
-
-
-
diff --git a/docs/images/metro_map_dark.md b/docs/images/metro_map_dark.md
new file mode 100644
index 00000000..65349f39
--- /dev/null
+++ b/docs/images/metro_map_dark.md
@@ -0,0 +1,45 @@
+```mermaid
+%%metro logo: ./nf-cmgg-preprocessing_logo_dark.png
+%%metro style: dark
+%%metro line: main | Alignment and Postprocessing | #00ff00
+%%metro line: qc | Quality control | #ff0000
+%%metro file: BCL_IN | BCL
+%%metro file: FASTQ_IN | FASTQ
+%%metro file: CRAM_OUT | CRAM
+%%metro file: MULTIQC_LIBRARY | HTML
+%%metro file: MULTIQC_SAV | HTML
+%%metro compact_offsets: true
+
+graph TD
+
+ BCL_IN[]
+ FASTQ_IN[]
+ MULTIQC_SAV[]
+ CRAM_OUT[]
+ MULTIQC_LIBRARY[]
+
+ BCL_IN -->|main | BCLCONVERT
+ BCL_IN -->|qc| MULTIQC_SAV
+ BCLCONVERT -->|qc| MULTIQC_SAV
+
+ FASTQ_IN[]
+ FASTQ_IN -->|qc,main| FASTP
+ BCLCONVERT -->|qc| FALCO
+ BCLCONVERT -->|qc,main| FASTP
+ FALCO -->|qc| MULTIQC_LIBRARY
+ FASTP -->|qc| MULTIQC_LIBRARY
+
+ FASTP -->|main| ALIGN
+ ALIGN -->|main| MARKDUP
+ MARKDUP -->|main| CRAM_OUT
+
+ CRAM_OUT -->|qc| MOSDEPTH
+ CRAM_OUT -->|qc| SAMTOOLS_COV
+ MOSDEPTH -->|qc| MULTIQC_LIBRARY
+ SAMTOOLS_COV -->|qc| MULTIQC_LIBRARY
+
+ CRAM_OUT -->|qc| SAMTOOLS_QC
+ CRAM_OUT -->|qc| PICARD
+ SAMTOOLS_QC -->|qc| MULTIQC_LIBRARY
+ PICARD -->|qc| MULTIQC_LIBRARY
+```
diff --git a/docs/images/metro_map_dark.svg b/docs/images/metro_map_dark.svg
new file mode 100644
index 00000000..0f452bc5
--- /dev/null
+++ b/docs/images/metro_map_dark.svg
@@ -0,0 +1,99 @@
+
+
diff --git a/docs/images/metro_map_light.md b/docs/images/metro_map_light.md
new file mode 100644
index 00000000..5cd6a5b0
--- /dev/null
+++ b/docs/images/metro_map_light.md
@@ -0,0 +1,45 @@
+```mermaid
+%%metro logo: ./nf-cmgg-preprocessing_logo_light.png
+%%metro style: light
+%%metro line: main | Alignment and Postprocessing | #00ff00
+%%metro line: qc | Quality control | #ff0000
+%%metro file: BCL_IN | BCL
+%%metro file: FASTQ_IN | FASTQ
+%%metro file: CRAM_OUT | CRAM
+%%metro file: MULTIQC_LIBRARY | HTML
+%%metro file: MULTIQC_SAV | HTML
+%%metro compact_offsets: true
+
+graph TD
+
+ BCL_IN[]
+ FASTQ_IN[]
+ MULTIQC_SAV[]
+ CRAM_OUT[]
+ MULTIQC_LIBRARY[]
+
+ BCL_IN -->|main | BCLCONVERT
+ BCL_IN -->|qc| MULTIQC_SAV
+ BCLCONVERT -->|qc| MULTIQC_SAV
+
+ FASTQ_IN[]
+ FASTQ_IN -->|qc,main| FASTP
+ BCLCONVERT -->|qc| FALCO
+ BCLCONVERT -->|qc,main| FASTP
+ FALCO -->|qc| MULTIQC_LIBRARY
+ FASTP -->|qc| MULTIQC_LIBRARY
+
+ FASTP -->|main| ALIGN
+ ALIGN -->|main| MARKDUP
+ MARKDUP -->|main| CRAM_OUT
+
+ CRAM_OUT -->|qc| MOSDEPTH
+ CRAM_OUT -->|qc| SAMTOOLS_COV
+ MOSDEPTH -->|qc| MULTIQC_LIBRARY
+ SAMTOOLS_COV -->|qc| MULTIQC_LIBRARY
+
+ CRAM_OUT -->|qc| SAMTOOLS_QC
+ CRAM_OUT -->|qc| PICARD
+ SAMTOOLS_QC -->|qc| MULTIQC_LIBRARY
+ PICARD -->|qc| MULTIQC_LIBRARY
+```
diff --git a/docs/images/metro_map_light.svg b/docs/images/metro_map_light.svg
new file mode 100644
index 00000000..ac3be697
--- /dev/null
+++ b/docs/images/metro_map_light.svg
@@ -0,0 +1,104 @@
+
+
diff --git a/docs/images/nf-cmgg-preprocessing_logo_dark.png b/docs/images/nf-cmgg-preprocessing_logo_dark.png
index 2f4bd13f..d91e6379 100644
Binary files a/docs/images/nf-cmgg-preprocessing_logo_dark.png and b/docs/images/nf-cmgg-preprocessing_logo_dark.png differ
diff --git a/docs/images/nf-cmgg-preprocessing_logo_dark.svg b/docs/images/nf-cmgg-preprocessing_logo_dark.svg
index 4fc66fb4..dd5620b1 100644
--- a/docs/images/nf-cmgg-preprocessing_logo_dark.svg
+++ b/docs/images/nf-cmgg-preprocessing_logo_dark.svg
@@ -5,7 +5,7 @@
width="1456.784"
height="522.443"
version="1.1"
- sodipodi:docname="nf-core-preprocessing_logo_dark.svg"
+ sodipodi:docname="nf-cmgg-preprocessing_logo_dark.svg"
inkscape:version="1.2 (dc2aeda, 2022-05-15)"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
@@ -54,7 +54,7 @@
id="path9"
d="m280.17 136.33-21.5-21.584h61v21.584z" />.st0{fill:#24af63}.st1{font-family:Arial, Helvetica, sans-serif;;font-weight:"bold"}.st2{font-size:209.8672px}.st4{fill:#ecdc86}.st7{fill:#396e35}nf-nf-cmgg/preprocessingpreproc.st0{fill:#24af63}.st1{font-family:Arial, Helvetica, sans-serif;font-weight:bold;}.st2{font-size:209.8672px}.st4{fill:#ecdc86}.st7{fill:#396e35}nf-nf-cmgg/preprocessingpreproc
diff --git a/docs/parameters.md b/docs/parameters.md
index 700a2292..0f7220fc 100644
--- a/docs/parameters.md
+++ b/docs/parameters.md
@@ -16,10 +16,10 @@ Define where the pipeline should find input data and save output data.
## Pipeline options
-| Parameter | Description | Type | Default | Required | Hidden |
-| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------- | --------- | -------- | ------ |
-| `split_fastq` | Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting at all. HelpUse the the tool FastP to split FASTQ file by number of reads. This parallelizes across fastq file shards speeding up mapping. Note although the minimum value is 250 reads, if you have fewer than 250 reads a single FASTQ shard will still be created. | `integer` | 100000000 | | |
-| `genelists` | Directory containing gene list bed files for granular coverage analysis | `string` | None | | |
+| Parameter | Description | Type | Default | Required | Hidden |
+| ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | --------- | -------- | ------ |
+| `split_fastq` | Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting at all. HelpUse the tool FastP to split FASTQ file by number of reads. This parallelizes across fastq file shards speeding up mapping. Note although the minimum value is 250 reads, if you have fewer than 250 reads a single FASTQ shard will still be created. | `integer` | 100000000 | | |
+| `genelists` | Directory containing gene list bed files for granular coverage analysis | `string` | None | | |
## Institutional config options
diff --git a/docs/usage.md b/docs/usage.md
index c44bccca..f59c809a 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -14,42 +14,56 @@ You will need to create a samplesheet with information about the samples you wou
The pipeline supports two types of samplesheets to be used as input: [`fastq`](#fastq-samplesheet) and [`flowcell`](#flowcell-samplesheet) samplesheets. The type will be automatically detected and applied by the pipeline. The pipeline will also auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire.
-### Common samplesheet fields
-
-This table shows all samplesheet fields that can be used by both the [`fastq`](#fastq-samplesheet) and the [`flowcell`](#flowcell-samplesheet) samplesheet types.
-
-| Column | Description | Required for Fastq | Required for Flowcell |
-| ------ | ---------------------------------------------------------------------------------- | ------------------ | --------------------- |
-| `id` | Unique samplesheet/flowcell ID. Can only contain letters, numbers and underscores. | :heavy_check_mark: | :heavy_check_mark: |
-
### Fastq samplesheet
-A `fastq` samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice.
-
-```csv title="samplesheet.csv"
-id,samplename,fastq_1,fastq_2,genome,tag
-CONTROL_REP1,CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,GRCh38,WES
-CONTROL_REP2,CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz,GRCh38,WES
-CONTROL_REP3,CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz,GRCh38,WES
-TREATMENT_REP1,TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz,,GRCh38,WES
-TREATMENT_REP2,TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz,,GRCh38,WES
-TREATMENT_REP3,TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz,,GRCh38,WES
-TREATMENT_REP3,TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz,,GRCh38,WES
+A `fastq` samplesheet file consisting of paired-end data may look something like the one below.
+
+```yml
+- id: DNA1_L001
+ samplename: DNA_paired1
+ library: test_library
+ genome: GRCh38
+ aligner: bwamem
+ markdup: bamsormadup
+ umi_aware: false
+ skip_trimming: false
+ trim_front: 0
+ trim_tail: 0
+ adapter_R1: AGATCGGAAGAGCACACGTCTGAACTCCTTA
+ adapter_R2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
+ run_coverage: true
+ disable_picard_metrics: false
+ roi: null
+ tag: WES
+ sample_type: DNA
+ fastq_1: https://github.com/nf-cmgg/test-datasets/raw/preprocessing/data/genomics/homo_sapiens/illumina/fastq/sample1_R1.fastq.gz
+ fastq_2: https://github.com/nf-cmgg/test-datasets/raw/preprocessing/data/genomics/homo_sapiens/illumina/fastq/sample1_R2.fastq.gz
```
Following table shows the fields that are used by the `fastq` samplesheet:
-| Column | Description | Required |
-| ------------ | -------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------- |
-| `fastq_1` | FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz' | :heavy_check_mark: |
-| `fastq_2` | FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz' | :x: |
-| `samplename` | The sample name corresponding to the sample in the Fastq file(s) | :heavy_check_mark: |
-| `genome` | The genome build to use for the analysis. Currently supports GRCh38, GRCm39 and GRCz11 | :heavy_check_mark: (unless `organism` is given) |
-| `organism` | Full name of the organism. Currently supports "Homo sapiens", "Mus musculus" and "Danio rerio" | :heavy_check_mark: (unless `genome` is given) |
-| `library` | Sample library name | :x: |
-| `tag` | The tag used by the sample. Can be one of WES, WGS or coPGT-M | :heavy_check_mark: |
-| `roi` | The path to a BED file containing Regions Of Interest for coverage analysis | :x: |
-| `aligner` | The aligner to use for this sample. Can be one of these: bowtie2, bwamem, bwamem2, dragmap, strobe and snap. set to `false` to output fastq. | :x: |
+| Column | Description | Required |
+| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------- |
+| `id` | Unique sample identifier | :heavy_check_mark: |
+| `samplename` | The sample name corresponding to the sample in the Fastq file(s) | :heavy_check_mark: |
+| `genome` | The genome build to use for the analysis. Currently supports `GRCh38`, `GRCm39` and `GRCz11` | :heavy_check_mark: (unless `organism` is given) |
+| `organism` | Full name of the organism. Currently supports `Homo sapiens`, `Mus musculus` and `Danio rerio` | :heavy_check_mark: (unless `genome` is given) |
+| `library` | Sample library name | :x: |
+| `tag` | The tag used by the sample. Can be one of `WES`, `WGS`, `SeqCap` and `coPGT-M` | :x: |
+| `aligner` | The aligner to use for this sample. Can be one of these: `bowtie2`, `bwamem`, `bwamem2`, `dragmap`, `strobe` and `snap`. Set to `false` to output fastq. | :heavy_check_mark: |
+| `markdup` | Markdup algorithm to use for duplicate marking. Can be set to `bamsormadup`, `samtools` or `false` | :x: |
+| `umi_aware` | Whether UMI-aware processing should be used. Only applies when `markdup` is set to `samtools` | :x: |
+| `skip_trimming` | Skip adapter trimming step | :x: |
+| `trim_front` | Number of bases to trim from the front of reads | :x: |
+| `trim_tail` | Number of bases to trim from the tail of reads | :x: |
+| `adapter_R1` | Adapter sequence for read 1 | :x: |
+| `adapter_R2` | Adapter sequence for read 2 | :x: |
+| `run_coverage` | Run coverage analysis | :x: |
+| `disable_picard_metrics` | Disable Picard metrics collection | :x: |
+| `roi` | The path to a BED file containing Regions Of Interest for coverage analysis | :x: |
+| `sample_type` | Sample type (e.g., `DNA`, `RNA`) | :x: |
+| `fastq_1` | FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz' | :heavy_check_mark: |
+| `fastq_2` | FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz' | :x: |
An [example samplesheet](../tests/inputs/test.yml) has been provided with the pipeline.
@@ -57,9 +71,12 @@ An [example samplesheet](../tests/inputs/test.yml) has been provided with the pi
A `flowcell` samplesheet file consisting of one sequencing run may look something like the one below.
-```csv title="samplesheet.csv"
-id,samplesheet,sample_info,flowcell
-RUN_NAME,RUN_NAME_samplesheet.csv,RUN_NAME_sampleinfo.csv,RUN_NAME_flowcell/
+```yml
+- id: 200624_A00834_0183_BHMTFYDRXX
+ samplesheet: https://github.com/nf-cmgg/test-datasets/raw/refs/heads/preprocessing/data/genomics/homo_sapiens/illumina/flowcell/SampleSheet_2.csv
+ lane: 1
+ flowcell: s3://test-data/genomics/homo_sapiens/illumina/bcl/
+ sample_info: https://github.com/nf-cmgg/test-datasets/raw/refs/heads/preprocessing/data/genomics/homo_sapiens/illumina/flowcell/SampleInfo_2.json
```
Following table shows the fields that are used by the `flowcell` samplesheet:
@@ -77,38 +94,24 @@ An [example samplesheet](../tests/inputs/test.yml) has been provided with the pi
A `flowcell` sample info JSON/YML file consisting for one sequencing run may look something like the one below.
-```json title="sample_info.json"
-{
- "samplename": "Sample1",
- "library": "test",
- "organism": "Homo sapiens",
- "tag": "WES"
-}
-```
-
-Following table shows the fields that are used by the `flowcell` samplesheet:
-
-| Column | Description | Required |
-| --------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ |
-| `samplename` | The sample name | :heavy_check_mark: |
-| `library` | The library name | :x: |
-| `tag` | Sample tag. Has to be one of these: WES, WGS, coPGT-M | :heavy_check_mark: |
-| `organism` | The organism of the sample. Has to be one of these: "Homo sapiens", "Mus musculus" or "Danio rerio" | :heavy_check_mark: |
-| `vivar_project` | The vivar project name (currently not used by the pipeline) | :x: |
-| `binsize` | The binsize for CNV analysis (currently not used by the pipeline) | :x: |
-| `panels` | A list of panels for coverage analysis | :x: |
-| `roi` | Region of interest BED file for coverage analysis | :x: |
-| `aligner` | The aligner to use for this sample. Can be one of these: bowtie2, bwamem, bwamem2, dragmap, strobe and snap. Set to `false` to output fastq. | :x: |
-
-### Multiple runs of the same sample
-
-The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes:
-
-```csv title="samplesheet.csv"
-sample,fastq_1,fastq_2
-CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
-CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz
-CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz
+```yml
+- id: DNA1_L001
+ samplename: DNA_paired1
+ library: test_library
+ genome: GRCh38
+ aligner: bwamem
+ markdup: bamsormadup
+ umi_aware: false
+ skip_trimming: false
+ trim_front: 0
+ trim_tail: 0
+ adapter_R1: AGATCGGAAGAGCACACGTCTGAACTCCTTA
+ adapter_R2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
+ run_coverage: true
+ disable_picard_metrics: false
+ roi: null
+ tag: WES
+ sample_type: DNA
```
## Running the pipeline
@@ -170,7 +173,7 @@ First, go to the [nf-cmgg/preprocessing releases page](https://github.com/nf-cmg
This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. For example, at the bottom of the MultiQC reports.
-To further assist in reproducbility, you can use share and re-use [parameter files](#running-the-pipeline) to repeat pipeline runs with the same settings without having to write out a command with every single parameter.
+To further assist in reproducibility, you can use share and re-use [parameter files](#running-the-pipeline) to repeat pipeline runs with the same settings without having to write out a command with every single parameter.
:::tip
If you wish to share such profile (such as upload as supplementary material for academic publications), make sure to NOT include cluster specific paths to files, nor institutional specific profiles.
@@ -193,7 +196,7 @@ The pipeline also dynamically loads configurations from [https://github.com/nf-c
Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important!
They are loaded in sequence, so later profiles can overwrite earlier profiles.
-If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer enviroment.
+If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer environment.
- `debug`
- A generic profile with settings to help with debugging the pipeline. It will use more verbose logging.
@@ -237,7 +240,7 @@ To change the resource requests, please see the [max resources](https://nf-co.re
### Custom Containers
-In some cases you may wish to change which container a step of the pipeline uses for a particular tool. By default nf-core pipelines use containers and software from the [biocontainers](https://biocontainers.pro/) or [bioconda](https://bioconda.github.io/) projects. However in some cases the pipeline specified version maybe out of date.
+In some cases you may wish to change which container a step of the pipeline uses for a particular tool. By default nf-core pipelines use containers and software from the [biocontainers](https://biocontainers.pro/) or [bioconda](https://bioconda.github.io/) projects. However in some cases the pipeline specified version may be out of date.
To use a different container from the default container specified in a pipeline, please see the [updating tool versions](https://nf-co.re/docs/usage/configuration#updating-tool-versions) section of the nf-core website.
@@ -262,7 +265,7 @@ Nextflow handles job submissions and supervises the running jobs. The Nextflow p
The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal so that the workflow does not stop if you log out of your session. The logs are saved to a file.
Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time.
-Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs).
+Some HPC setups also allow you to run nextflow within a cluster job submitted to your job scheduler (from where it submits more jobs).
## Nextflow memory requirements