Skip to content

samuelfabel/r-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

r-from-scratch

R is a language built for statistics. That means it thinks differently from most programming languages — and that difference is worth understanding from the ground up.

This repository covers R systematically, from the basics of the language to applied machine learning. The code is organized in sections that build on each other. You can read them in order or jump to what you need.

The applied section at the end uses the National Survey of Children's Health (NSCH) 2023 — public microdata from the U.S. Census Bureau — to demonstrate the full analytical workflow in a real context.


Structure

r-from-scratch/
├── data/                             # Sample files + NSCH download instructions
├── install_packages.R                # Install all CRAN dependencies at once
├── 01_basics/                        # Data types, operators, coercion
├── 02_data_structures/               # Vectors, matrices, lists, data frames, factors
├── 03_control_flow/                  # Conditionals, loops
├── 04_functions/                     # Basics, lexical scope, closures, recursion
├── 05_apply_family/                  # apply, lapply, sapply, tapply, mapply, purrr
├── 06_strings_and_dates/             # stringr, regex, lubridate
├── 07_io_and_data_import/            # CSV, Excel, JSON, XML, databases, web scraping
├── 08_data_manipulation/             # Base R, dplyr, tidyr, data.table
├── 09_visualization/                 # Base graphics, ggplot2, plotly
├── 10_statistics/                    # Descriptive stats, distributions, hypothesis testing, regression
├── 11_debugging_and_performance/     # Debugging, profiling, benchmarking, parallel computing
├── 12_oop/                           # S3, S4, R5 reference classes
├── 13_functional_programming/        # Higher-order functions, memoization, pipe operators
├── 14_environment_and_packages/      # Namespaces, package creation, renv
├── 15_reporting/                     # R Markdown, parameterized reports, Quarto
├── 16_ml_supervised/                 # KNN, decision trees, random forest, SVM, naive Bayes
├── 17_ml_unsupervised/               # K-means, hierarchical clustering, DBSCAN
├── 18_dimensionality_reduction/      # PCA, t-SNE, UMAP
├── 19_model_evaluation/              # Confusion matrix, cross-validation, ROC/AUC, benchmarking
└── 20_applied_project/               # End-to-end analysis on NSCH 2023 (Florida subset)

Suggested Learning Paths

Language fundamentals (start here if R is new)

01_basics02_data_structures03_control_flow04_functions05_apply_family

Data science workflow

06_strings_and_dates07_io_and_data_import08_data_manipulation09_visualization10_statistics

Machine learning

16_ml_supervised17_ml_unsupervised18_dimensionality_reduction19_model_evaluation20_applied_project

Full curriculum: follow sections 01–20 in order. Each folder is self-contained; later sections assume familiarity with earlier ones.


File Formats

.R files contain code and inline comments. They run as-is in R or RStudio.

.Rmd files combine code, output, and explanation in a single document. Render them with:

rmarkdown::render("file.Rmd")

.qmd files use Quarto. Render with:

quarto render file.qmd

How to Use This Repository

Clone the repository and open it in RStudio or any R environment.

git clone https://github.com/samuelfabel/r-from-scratch.git
cd r-from-scratch

Install all dependencies once:

source("install_packages.R")

Each section is self-contained. Dependencies are loaded at the top of each file. If a single package is missing, install it with:

install.packages("package_name")

For reproducible package management with pinned versions, see 14_environment_and_packages/renv_reproducibility.R and run renv::init() in the project root.

Sample data for sections 07 is in data/. The applied project requires a separate NSCH download — see data/README.md.

Validate syntax

From the project root, after npm install:

npm run check

This runs markdown lint on all .md files and R syntax checks on all .R, .Rmd, and .qmd files. With R installed, you can also run:

Rscript scripts/check_syntax.R

GitHub Actions runs the same checks on every push (see .github/workflows/check-syntax.yml).


Applied Project

Section 20_applied_project/ applies the techniques from sections 16–19 to a real dataset.

Data source: National Survey of Children's Health (NSCH) 2023, U.S. Census Bureau / HRSA Maternal and Child Health Bureau.

The analysis filters to Florida and uses demographic and socioeconomic predictors to model parent-reported ASD diagnosis. It runs through data exploration, preprocessing, dimensionality reduction, and comparison of supervised classification models.

See 20_applied_project/README.md for the full pipeline, variables, and constraints.


Requirements

  • R >= 4.1.0
  • RStudio (recommended) or any R environment
  • Quarto CLI (for .qmd files in section 15)

Origin

This repository started as a fork of the Johns Hopkins R Programming assignment on Coursera (cachematrix.R, 2015). That file now lives in 13_functional_programming/memoization.R, which is where it conceptually belongs.

References

  • R Programming for Data Science, Roger D. Peng — Chapter 4: R Nuts and Bolts
  • Advanced R, Hadley Wickham — Chapter 3: Vectors
  • R for Data Science, Wickham et al. — https://r4ds.hadley.nz
  • An Introduction to Statistical Learning, James et al. — https://www.statlearning.com

License

MIT — see LICENSE.

Contributors