Skip to content

ghosal06/RNA-seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RNA-Seq Pipeline in R

Welcome to my RNA-Seq Pipeline! :) This pipeline is designed to take you from raw sequencing data all the way to biological insights using R. It's streamlined, easy to use, and packed with tips and tricks to make your life easier. Inspired by the nf-core standards, this pipeline ensures high-quality, reproducible results.

Table of Contents

  1. Biological Overview
  2. Pipeline Steps
  3. Usage Instructions
  4. Dependencies
  5. License
  6. Contact

Biological Overview

RNA sequencing (RNA-Seq) is a powerful technique that allows researchers to capture a snapshot of the entire transcriptome—the complete set of RNA transcripts—in a cell at a given moment. By analyzing RNA-Seq data, we can uncover gene expression patterns, identify novel transcripts, and understand the molecular mechanisms underlying various biological processes and diseases.

This pipeline guides you through each critical step of RNA-Seq data analysis, ensuring that your results are both accurate and biologically meaningful.


Pipeline Steps

1. Quality Control (QC)

Biological Meaning:
Before diving into analysis, it's essential to ensure that your raw sequencing data is of high quality. Poor-quality data can lead to misleading results, so QC helps identify any issues early on.

Overview:
This step assesses the quality of your raw sequencing reads, checking for issues like low-quality bases, adapter contamination, and uneven base composition. High-quality data is crucial for reliable downstream analyses.

Pro Tip:
After running QC, take a peek at the reports to spot any glaring issues like adapter contamination or low-quality reads. It saves headaches later!


2. Read Trimming and Filtering

Biological Meaning:
Trimming removes low-quality bases and adapter sequences from your reads. This step enhances the accuracy of downstream analyses like alignment and quantification.

Overview:
By trimming and filtering your reads, you ensure that only high-quality data proceeds through the pipeline. This improves the reliability of alignment and the accuracy of gene expression measurements.

Pro Tip:
Setting the right quality threshold is crucial. If you're unsure, start with a common threshold and adjust based on your QC reports.


3. Alignment/Mapping

Biological Meaning:
Mapping aligns your trimmed reads to a reference genome or transcriptome. This step is foundational for quantifying gene expression accurately.

Overview:
Accurate alignment is essential for determining the origin of each read, which directly impacts the quantification of gene expression levels. Proper alignment ensures that reads are correctly assigned to their respective genes.

Pro Tip:
Ensure your reference genome is up-to-date. Using an outdated genome can lead to mismatches and lower alignment rates.


4. Quantification

Biological Meaning:
Quantification counts the number of reads mapping to each gene or transcript. This data is pivotal for understanding gene expression levels across your samples.

Overview:
By counting reads per gene, you obtain a quantitative measure of gene expression. This forms the basis for comparing expression levels between different conditions or treatments.

Pro Tip:
Double-check your annotation files for consistency with your reference genome to avoid counting errors.


5. Normalization

Biological Meaning:
Normalization adjusts for technical variations like differing sequencing depths, ensuring that gene expression comparisons across samples are accurate.

Overview:
Normalization accounts for factors such as varying sequencing depths and library sizes, allowing for fair comparisons of gene expression levels between samples.

Pro Tip:
Always visualize normalized counts to ensure proper scaling and to detect any remaining technical biases.


6. Differential Expression Analysis

Biological Meaning:
This step identifies genes that are significantly up or downregulated between different conditions, shedding light on biological processes and pathways involved.

Overview:
Differential expression analysis helps pinpoint genes that show meaningful changes in expression, which can be linked to specific biological functions or disease states.

Pro Tip:
Adjust the fold change and p-value thresholds based on your biological questions to focus on the most relevant genes.


7. Functional Analysis and Visualization

Biological Meaning:
Functional analysis interprets the biological significance of differentially expressed genes by identifying enriched pathways and gene ontology (GO) terms. Visualization helps in effectively communicating your findings.

Overview:
This final step translates your differential expression results into biological insights, highlighting pathways and processes that are most affected under your experimental conditions.

Pro Tip:
Customize your plots with different themes to match your presentation style and make your results more impactful.


Usage Instructions

  1. Clone the Repository:

    git clone https://github.com/Mehdi-kn/RNAseq-Pipeline.git
    cd RNAseq-Pipeline
  2. Prepare Your Data:

    • Place your raw FASTQ files in the data/raw/ directory.
    • Ensure you have a reference genome (genome.fa) and annotation file (annotation.gtf) in the reference/ directory.
  3. Install R and Required Packages:

    Make sure you have R installed (version 4.0 or higher). Open R and run:

    install.packages("BiocManager")
    BiocManager::install(c("fastqcr", "ShortRead", "Rsubread", "DESeq2", "clusterProfiler", "org.Hs.eg.db"))
  4. Run the Pipeline Scripts:

    Execute each script in the scripts/ directory in the following order:

    Rscript scripts/01_qc.R
    Rscript scripts/02_trimming.R
    Rscript scripts/03_alignment.R
    Rscript scripts/04_quantification.R
    Rscript scripts/05_normalization.R
    Rscript scripts/06_differential_expression.R
    Rscript scripts/07_functional_analysis.R

    Alternatively, you can source the scripts within an R session:

    source("scripts/01_qc.R")
    source("scripts/02_trimming.R")
    # ... and so on
  5. Review Results:

    • QC Reports: results/qc_reports/
    • Aligned BAM Files: results/alignment/
    • Count Matrices: results/counts/
    • Differential Expression Results: results/differential_expression/
    • Functional Analysis Results and Plots: results/functional_analysis/

Dependencies

  • R (>= 4.0)

  • Bioconductor Packages:

    • fastqcr
    • ShortRead
    • Rsubread
    • DESeq2
    • clusterProfiler
    • org.Hs.eg.db (Replace with appropriate organism package if necessary)
  • External Tools:

    • FastQC: For initial quality control (automated via fastqcr package)
    • Bowtie2/SAMtools: Optional, used internally by Rsubread for alignment

Installation Notes:

  • Ensure that external tools like FastQC, Bowtie2, and SAMtools are installed and accessible in your system's PATH if you choose to run them outside of R.
  • The provided R scripts utilize packages that interface with these tools, simplifying the installation process.

License

This project is licensed under the MIT License.


Contact

For questions, suggestions, or contributions, please reach out to:

Mehdi-kn
Email: m.knidiri70@gmail.com


Final Notes

  • Customization: Adjust sample information and thresholds based on your experimental design and specific needs.
  • Organism: Ensure you use the correct organism annotation package (e.g., org.Mm.eg.db for mouse) in the functional analysis step.
  • Error Handling: For a more robust pipeline, consider adding error checks and handling unexpected issues gracefully.
  • Further Enhancements: You can expand the pipeline to include additional analyses like pathway enrichment, alternative splicing, or integration with other omics data.

By following this guide and utilizing the provided scripts, you can establish a robust RNA-Seq pipeline tailored to your research needs. Keep experimenting, and feel free to tweak the scripts to better fit your workflow!

🧬✨

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages