A hands-on tutorial for learning Nextflow, a workflow management system for bioinformatics pipelines. This tutorial progresses from basic "Hello World" examples to a complete parallel BLAST workflow.
- Access to an HPC environment with Nextflow and BLAST+ installed
- Basic knowledge of command-line operations
- Understanding of FASTA format and BLAST basics (for later examples)
-
Load Nextflow module:
module load nextflow
-
Set Nextflow home directory:
export NXF_HOME='/work/idoerg/<netid>/.nextflow'
Replace
<netid>with your actual network ID. -
Verify installation:
nextflow -h
This tutorial demonstrates key Nextflow concepts through seven progressively complex examples:
- Basic output - Writing to stdout
- File output - Creating output files
- Publishing results - Using
publishDirdirective - Multi-language support - Running Python scripts
- Multiple processes - Chaining multiple processes
- Real-world pipeline - Complete BLAST workflow
- Parallel processing - Splitting work across chunks
Concepts: Process definition, stdout, basic workflow
A simple introduction to Nextflow that prints a message to standard output.
nextflow run code/01_hello.nfKey concepts:
nextflow.enable.dsl=2- Enables DSL2 syntaxprocessblocks define computational tasksoutputdirective specifies what the process producesworkflowblock orchestrates process execution.view()displays output to console
Concepts: File outputs, output paths
Demonstrates how to write process output to a file instead of stdout.
nextflow run code/02_hellowrite2file.nfKey concepts:
path 'result.txt'specifies file output- Output files are stored in Nextflow's
work/directory by default
Concepts: publishDir, output management
Shows how to copy results to a designated output directory.
nextflow run code/03_hellofile.nfKey concepts:
publishDirdirective copies results to specified locationmode: 'copy'creates a copy (alternatives: 'symlink', 'move')- Results appear in the
output/directory
Concepts: Shebang, Python in Nextflow
Demonstrates running Python code within a Nextflow process.
nextflow run code/04_hellopython.nfKey concepts:
#!/usr/bin/pythonshebang specifies interpreter- Any scripting language can be used (Python, R, Perl, etc.)
- Script content goes in the triple-quoted string
Concepts: Process composition, parallel execution
Shows how to define and run multiple independent processes.
nextflow run code/05_hellomultiprocess.nfKey concepts:
- Multiple processes can be defined in one workflow
- Processes without dependencies run in parallel
- Each process has its own execution environment
Concepts: Input parameters, process chaining, data dependencies
A real bioinformatics pipeline that downloads a database, creates BLAST indices, and performs sequence alignment.
nextflow run code/06_blast.nf -profile hpc_modulesKey concepts:
paramsdefine configurable workflow parametersinputandoutputdirectives chain processes together- Processes execute only when their inputs are ready
- Profile configuration (in
nextflow.config) manages environment
Pipeline steps:
DOWNLOAD_UNIPROT_FASTA- Downloads yeast proteins from UniProtMAKE_BLAST_DB- Creates BLAST database from downloaded sequencesBLASTP- Runs protein BLAST search with query sequences
Concepts: Channel operations, data parallelization, result aggregation
Extends the BLAST pipeline with parallel processing by splitting the query file into chunks.
nextflow run code/07_blastparallel.nf -profile hpc_modulesKey concepts:
Channel.fromPath()creates channels from files.splitFasta()divides FASTA files into chunks.map()transforms channel data.collect()aggregates results from parallel processestuplepasses multiple values together
Pipeline steps:
DOWNLOAD_UNIPROT_FASTA- Downloads databaseMAKE_BLAST_DB- Creates BLAST indicesBLASTP- Runs BLAST on each chunk in parallelMERGE_BLAST- Combines all chunk results into single file
Performance benefit: Processing 5 sequences at once (configurable via params.chunkSize) allows parallel execution and faster completion.
# Run without profile (for simple examples)
nextflow run code/01_hello.nf
# Run with HPC modules profile (for BLAST examples)
nextflow run code/06_blast.nf -profile hpc_modules# Resume from last successful step
nextflow run code/07_blastparallel.nf -resume -profile hpc_modules
# Override parameters
nextflow run code/07_blastparallel.nf --chunkSize 10 -profile hpc_modules
# View execution timeline
nextflow run code/07_blastparallel.nf -profile hpc_modules -with-timeline timeline.html# Remove work directory after successful run
rm -rf work/
# Clean up Nextflow cache
nextflow clean -fTo compare execution times between Nextflow and traditional bash scripts, you can use the time command:
# Time the bash script execution
time ./code/06_blast.sh# Time the Nextflow pipeline
time nextflow run code/06_blast.nf -profile hpc_modules
# For subsequent runs (using cache)
time nextflow run code/06_blast.nf -profile hpc_modules -resumeThe time command shows three values:
- real - Total wall clock time (actual elapsed time)
- user - CPU time spent in user mode
- sys - CPU time spent in kernel mode
# Clean run (no cached results)
rm -rf work/ work_bash/ results/
# Run bash version
echo "=== Bash Script ==="
time ./code/06_blast.sh
# Run Nextflow version
echo "=== Nextflow Pipeline ==="
time nextflow run code/06_blast.nf -profile hpc_modulesKey Observations:
- First run: Nextflow has overhead for workflow management
- Cached runs: Nextflow's
-resumeskips completed steps (huge time saver!) - Parallel workflows (07_blastparallel.nf): Nextflow shows real advantage with parallelization
- Reproducibility: Nextflow tracks all intermediate steps automatically
# Compare sequential vs parallel BLAST
echo "=== Sequential (06_blast.nf) ==="
time nextflow run code/06_blast.nf -profile hpc_modules
echo "=== Parallel (07_blastparallel.nf) ==="
time nextflow run code/07_blastparallel.nf -profile hpc_modules --chunkSize 5nextflow_tutorial/
├── README.md # This file
├── nextflow.config # Configuration profiles
├── code/ # Tutorial scripts
│ ├── 01_hello.nf # Basic output
│ ├── 02_hellowrite2file.nf # File output
│ ├── 03_hellofile.nf # Publishing results
│ ├── 04_hellopython.nf # Python integration
│ ├── 05_hellomultiprocess.nf # Multiple processes
│ ├── 06_blast.nf # BLAST pipeline (Nextflow)
│ ├── 06_blast.sh # BLAST pipeline (bash - for comparison)
│ └── 07_blastparallel.nf # Parallel BLAST
├── data/ # Input data
│ └── query.fasta # Query sequences
├── output/ # Simple example outputs
├── results/ # BLAST results
├── work/ # Nextflow working directory
└── work_bash/ # Bash script working directory
The nextflow.config file defines two profiles:
For HPC environments using environment modules:
nextflow run code/06_blast.nf -profile hpc_modules- Loads BLAST+ module before each task
- Uses local executor
For local execution without modules:
nextflow run code/01_hello.nf -profile local- Basic local execution
- Assumes tools are in PATH
- Self-contained computational tasks
- Define inputs, outputs, and script to execute
- Can use any scripting language
- Asynchronous queues that connect processes
- Pass data between processes
- Support operations like map, filter, split, collect
- Define the execution logic
- Connect processes through channels
- Can be nested and modularized
publishDir- Where to save resultsinput- What data the process receivesoutput- What data the process producesscript- The command(s) to execute
- Nextflow Documentation
- Nextflow Patterns
- nf-core - Community curated pipelines
Issue: "command not found" errors
- Solution: Ensure modules are loaded or tools are in PATH. Use
-profile hpc_modulesfor BLAST examples.
Issue: "work directory too large"
- Solution: Run
nextflow clean -fto remove cached work files.
Issue: Pipeline fails partway through
- Solution: Use
-resumeflag to continue from last successful step.
Issue: Permission denied errors
- Solution: Check file permissions and ensure NXF_HOME is writable.
- Always use
-resume- Saves time by skipping completed steps - Start simple - Master basic examples before complex pipelines
- Check work directory - Inspect intermediate files in
work/for debugging - Use
.view()- Add to channels to see what data is flowing through - Read error messages - Nextflow provides detailed logs and error traces
After completing this tutorial:
- Explore nf-core pipelines for production-ready workflows
- Learn about containers (Docker/Singularity) for reproducibility
- Study executor configuration for HPC clusters
- Build your own pipeline for your research needs