r-from-scratch

R is a language built for statistics. That means it thinks differently from most programming languages — and that difference is worth understanding from the ground up.

This repository covers R systematically, from the basics of the language to applied machine learning. The code is organized in sections that build on each other. You can read them in order or jump to what you need.

The applied section at the end uses the National Survey of Children's Health (NSCH) 2023 — public microdata from the U.S. Census Bureau — to demonstrate the full analytical workflow in a real context.

Structure

r-from-scratch/
├── data/                             # Sample files + NSCH download instructions
├── install_packages.R                # Install all CRAN dependencies at once
├── 01_basics/                        # Data types, operators, coercion
├── 02_data_structures/               # Vectors, matrices, lists, data frames, factors
├── 03_control_flow/                  # Conditionals, loops
├── 04_functions/                     # Basics, lexical scope, closures, recursion
├── 05_apply_family/                  # apply, lapply, sapply, tapply, mapply, purrr
├── 06_strings_and_dates/             # stringr, regex, lubridate
├── 07_io_and_data_import/            # CSV, Excel, JSON, XML, databases, web scraping
├── 08_data_manipulation/             # Base R, dplyr, tidyr, data.table
├── 09_visualization/                 # Base graphics, ggplot2, plotly
├── 10_statistics/                    # Descriptive stats, distributions, hypothesis testing, regression
├── 11_debugging_and_performance/     # Debugging, profiling, benchmarking, parallel computing
├── 12_oop/                           # S3, S4, R5 reference classes
├── 13_functional_programming/        # Higher-order functions, memoization, pipe operators
├── 14_environment_and_packages/      # Namespaces, package creation, renv
├── 15_reporting/                     # R Markdown, parameterized reports, Quarto
├── 16_ml_supervised/                 # KNN, decision trees, random forest, SVM, naive Bayes
├── 17_ml_unsupervised/               # K-means, hierarchical clustering, DBSCAN
├── 18_dimensionality_reduction/      # PCA, t-SNE, UMAP
├── 19_model_evaluation/              # Confusion matrix, cross-validation, ROC/AUC, benchmarking
└── 20_applied_project/               # End-to-end analysis on NSCH 2023 (Florida subset)

Suggested Learning Paths

Language fundamentals (start here if R is new)

01_basics → 02_data_structures → 03_control_flow → 04_functions → 05_apply_family

Data science workflow

06_strings_and_dates → 07_io_and_data_import → 08_data_manipulation → 09_visualization → 10_statistics

Machine learning

16_ml_supervised → 17_ml_unsupervised → 18_dimensionality_reduction → 19_model_evaluation → 20_applied_project

Full curriculum: follow sections 01–20 in order. Each folder is self-contained; later sections assume familiarity with earlier ones.

File Formats

.R files contain code and inline comments. They run as-is in R or RStudio.

.Rmd files combine code, output, and explanation in a single document. Render them with:

rmarkdown::render("file.Rmd")

.qmd files use Quarto. Render with:

quarto render file.qmd

How to Use This Repository

Clone the repository and open it in RStudio or any R environment.

git clone https://github.com/samuelfabel/r-from-scratch.git
cd r-from-scratch

Install all dependencies once:

source("install_packages.R")

Each section is self-contained. Dependencies are loaded at the top of each file. If a single package is missing, install it with:

install.packages("package_name")

For reproducible package management with pinned versions, see 14_environment_and_packages/renv_reproducibility.R and run renv::init() in the project root.

Sample data for sections 07 is in data/. The applied project requires a separate NSCH download — see data/README.md.

Validate syntax

From the project root, after npm install:

npm run check

This runs markdown lint on all .md files and R syntax checks on all .R, .Rmd, and .qmd files. With R installed, you can also run:

Rscript scripts/check_syntax.R

GitHub Actions runs the same checks on every push (see .github/workflows/check-syntax.yml).

Applied Project

Section 20_applied_project/ applies the techniques from sections 16–19 to a real dataset.

Data source: National Survey of Children's Health (NSCH) 2023, U.S. Census Bureau / HRSA Maternal and Child Health Bureau.

Download: https://www.census.gov/programs-surveys/nsch/data/datasets/nsch2023.html
Format: SAS (.sas7bdat) — read with haven::read_sas()
Place at: data/nsch_2023_topical.sas7bdat

The analysis filters to Florida and uses demographic and socioeconomic predictors to model parent-reported ASD diagnosis. It runs through data exploration, preprocessing, dimensionality reduction, and comparison of supervised classification models.

See 20_applied_project/README.md for the full pipeline, variables, and constraints.

Requirements

R >= 4.1.0
RStudio (recommended) or any R environment
Quarto CLI (for .qmd files in section 15)

Origin

This repository started as a fork of the Johns Hopkins R Programming assignment on Coursera (cachematrix.R, 2015). That file now lives in 13_functional_programming/memoization.R, which is where it conceptually belongs.

References

R Programming for Data Science, Roger D. Peng — Chapter 4: R Nuts and Bolts
Advanced R, Hadley Wickham — Chapter 3: Vectors
R for Data Science, Wickham et al. — https://r4ds.hadley.nz
An Introduction to Statistical Learning, James et al. — https://www.statlearning.com

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

r-from-scratch

Structure

Suggested Learning Paths

File Formats

How to Use This Repository

Validate syntax

Applied Project

Requirements

Origin

References

License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
01_basics		01_basics
02_data_structures		02_data_structures
03_control_flow		03_control_flow
04_functions		04_functions
05_apply_family		05_apply_family
06_strings_and_dates		06_strings_and_dates
07_io_and_data_import		07_io_and_data_import
08_data_manipulation		08_data_manipulation
09_visualization		09_visualization
10_statistics		10_statistics
11_debugging_and_performance		11_debugging_and_performance
12_oop		12_oop
13_functional_programming		13_functional_programming
14_environment_and_packages		14_environment_and_packages
15_reporting		15_reporting
16_ml_supervised		16_ml_supervised
17_ml_unsupervised		17_ml_unsupervised
18_dimensionality_reduction		18_dimensionality_reduction
19_model_evaluation		19_model_evaluation
20_applied_project		20_applied_project
data		data
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint-cli2.yaml		.markdownlint-cli2.yaml
.prettierrc		.prettierrc
CONTEXT.md		CONTEXT.md
LICENSE		LICENSE
README.md		README.md
install_packages.R		install_packages.R
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

r-from-scratch

Structure

Suggested Learning Paths

File Formats

How to Use This Repository

Validate syntax

Applied Project

Requirements

Origin

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages