This repository is a concept-first exploration of gradients, automatic differentiation, optimisation, and surrogate-based decision-making using PyTorch.
Rather than teaching PyTorch only as a high-level deep-learning framework, the material uses it as a numerical and analytical tool for understanding a much broader progression:
- tensors and linear algebra,
- computation graphs and automatic differentiation,
- gradient structure and optimisation dynamics,
- modelling unknown objective functions,
- uncertainty-aware surrogate models,
- and finally Bayesian Optimisation through BoTorch.
The goal is not just to show how to compute gradients or train models, but to build a coherent path from low-level PyTorch mechanics to modern data-efficient optimisation.
In other words, this repository is designed as a bridge:
from PyTorch fundamentals to Bayesian Optimisation.
Most tutorials either:
- stop at PyTorch basics,
- jump quickly into neural-network training,
- or treat Bayesian Optimisation as a separate black-box topic.
This repository takes a different route.
It deliberately slows down and builds the ideas in sequence.
PyTorch is treated not just as a framework for fitting models, but as a flexible environment for understanding:
- gradients as mathematical objects,
- autograd as a computational mechanism,
- optimisation as a dynamical process,
- and surrogate modelling as a way to reason about expensive unknown functions.
By the time Bayesian Optimisation is introduced, it should feel like the natural outcome of ideas already developed:
- first understand gradients,
- then understand optimisation,
- then understand why optimisation alone is not enough,
- then build models of unknown functions,
- and finally use those models to guide intelligent search.
The aim is therefore not just to teach isolated tools, but to build a conceptual pathway:
PyTorch → gradients → optimisation → surrogate modelling → Bayesian Optimisation
The material is organised into Parts, each forming a coherent conceptual unit.
├── part_1/
│ ├── worked/ #(worked and exploratory versions)
│ ├── README.md
│ ├── tutorial_01_tensor_fundamentals.ipynb
│ ├── tutorial_02_common_pytorch_tensor_operations.ipynb
│ ├── tutorial_03_minimal_learning_problem.ipynb
│ ├── tutorial_04_autograd_and_graphs.ipynb
│ ├── tutorial_05_tensor_gradients_and_vjp.ipynb
├── part_2/
│ ├── worked/ #(worked and exploratory versions)
│ ├── README.md
│ ├── tutorial_01_gradient_descent_as_dynamical_system.ipynb
│ ├── tutorial_02_geometry_and_conditioning_of_optimisation.ipynb
│ ├── tutorial_03_momentum_as_a_dynamical_system.ipynb
│ ├── tutorial_04_optimisaiton_beyond_convexity.ipynb
├── part_3/
│ ├── worked/ #(worked and exploratory versions)
│ ├── README.md
│ ├── tutorial_01_why_model_an_unknown_function.ipynb
│ ├── tutorial_02_prediction_uncertainty_and_confidence.ipynb
│ ├── tutorial_03_gaussian_processes_as_surrogate_models.ipynb
│ └── tutorial_04_choosing_where_to_evaluate_next.ipynb
├── part_4/
│ ├── worked/ #(worked and exploratory versions)
│ ├── README.md
│ ├── tutorial_01_from_gaussian_processes_to_botorch_models.ipynb
│ ├── tutorial_02_standard_acquisition_functions_in_botorch.ipynb
│ ├── tutorial_03_full_single_loop_bo_workflow.ipynb
│ └── tutorial_04_practical_modelling_choices_in_botorch.ipynb
├── part_5/
│ ├── worked/ #(worked and exploratory versions)
│ ├── README.md
│ ├── tutorial_01_higher_dimensional_custom_bo_for_experimental_design_spaces.ipynb
│ ├── tutorial_02_batch_bo_for_parallel_experimentation.ipynb
│ ├── tutorial_03_mixed_variable_and_constrained_bo.ipynb
│ └── tutorial_04_budget_aware_and_human_in_the_loop_bo_workflows.ipynb
├── part_6/
│ ├── worked/ #(worked and exploratory versions)
│ ├── README.md
│ ├── tutorial_01_noisy_and_replication_aware_bo.ipynb
│ ├── tutorial_02_multi_objective_bo_and_pareto_optimal_dicision_making.ipynb
│ ├── tutorial_03_multi_fidelity_and_contextual_bo.ipynb
│ └── tutorial_04_structured_bo_for_hierarchical_experimental_workflows.ipynb
├── LICENSE
└── README.mdEach notebook is self-contained and can be read independently, but the intended experience is sequential.
🌱 The repository is still growing.
Part 1 builds the conceptual foundations needed to understand gradients before optimisation algorithms are introduced.
It covers:
- tensor mechanics and numerical structure,
- how autograd builds and traverses computation graphs,
- scalar vs tensor-valued differentiation,
- vector–Jacobian products as the core object of
backward, - interpreting
.gradas sensitivity, - and visualising gradient structure in controlled experiments.
Part 1 concludes by connecting gradient structure to local optimisation intuition, without yet introducing optimisers or training pipelines.
📂 See part_1/README.md for full details.
Part 2 builds directly on the gradient intuition developed in Part 1 and studies how optimisation behaviour emerges over time.
It covers:
- gradient descent as a discrete dynamical system,
- learning rates, stability, and contraction,
- geometry, conditioning, and narrow valleys,
- momentum and inertia,
- and the challenges of optimisation beyond convexity.
📂 See part_2/README.md for full details.
Part 3 is the conceptual bridge from optimisation dynamics to Bayesian Optimisation.
We now study what happens when the objective function is:
- expensive to evaluate,
- only partially observed,
- and better handled through a learned surrogate than through brute-force search.
This part develops the core ideas needed before using modern Bayesian Optimisation libraries.
It introduces:
- why expensive objectives require modelling,
- how surrogate models approximate unknown functions,
- why prediction alone is not enough without uncertainty,
- Gaussian Processes as principled probabilistic surrogates,
- and acquisition functions for deciding where to evaluate next.
This prepares the ground for the next stage of the repository, where these ideas are implemented more practically using BoTorch.
📂 See part_3/README.md for full details.
Part 4 turns the conceptual foundations of Part 3 into practical workflows using BoTorch.
Rather than building Gaussian Processes and acquisition logic entirely from first principles, we now study how those same ideas are implemented in a modern Bayesian Optimisation library.
It covers:
- fitting Gaussian Process surrogates in BoTorch,
- working with BoTorch posterior objects,
- acquisition functions such as EI, PI, and UCB,
- optimising acquisition functions to propose new candidates,
- and building the standard sequential Bayesian Optimisation loop in practice.
Part 4 is still focused on standard single-loop Bayesian Optimisation. Its purpose is to make the transition from theory to implementation clear and interpretable, before moving on to more advanced BO strategies.
📂 See part_4/README.md for full details.
Part 5 extends the standard BoTorch workflows from Part 4 into more realistic experimental optimisation settings.
It covers:
- higher-dimensional BO for experimental design spaces,
- batch BO for parallel experimentation,
- mixed-variable and constrained BO,
- decode–repair–evaluate workflows,
- budget-aware BO under unequal experiment costs,
- and human-in-the-loop BO with simple decision rules.
Part 5 focuses on the idea that practical BO is not only about maximising an acquisition function. In realistic workflows, candidate selection may also depend on feasibility, cost, budget, and human judgement.
This part therefore bridges standard single-loop BO and more realistic scientific optimisation campaigns.
📂 See part_5/README.md for full details.
Part 6 extends the realism-oriented BO workflows of Part 5 into settings where the optimisation problem itself has richer statistical or workflow structure.
It covers:
- noisy BO and replication-aware decision-making,
- multi-objective BO and Pareto-optimal decision-making,
- multi-fidelity BO under unequal evaluation cost and accuracy,
- contextual BO where the best design depends on external conditions,
- and structured BO for hierarchical experimental workflows.
Part 6 focuses on the idea that realistic BO is often not just about choosing the next design point in a flat space. In many scientific problems, the optimiser must also reason about noise, repeated measurements, trade-offs between objectives, cheaper and more expensive evaluations, context-dependent recommendations, or multi-stage experimental structure.
This part therefore bridges practical workflow-aware BO and more advanced BO settings where the meaning of the input variables, the observations, or the decision process itself becomes more structured.
📂 See part_6/README.md for full details.
Install the tutorial runtime dependencies with:
pip install -r requirements.txtThe notebooks assume a Python 3 Jupyter environment with PyTorch, BoTorch, GPyTorch, NumPy, pandas, and matplotlib available.
For most notebooks, two versions exist:
-
Fresh: clean learner-facing versions intended for reading, teaching, or first-pass study. These notebooks are in the main folder.
-
Worked: executed reference versions containing outputs, figures, and numerical results. These notebooks are in each part's
worked/folder.
This separation keeps the main narrative clear while preserving the full reasoning process.
This repository is standalone; BO Forge can be treated as an optional downstream project for applying these ideas in a fuller optimisation workflow.
This repository is suitable for:
- advanced undergraduates,
- master’s students,
- PhD students,
- or practitioners who want a deeper understanding of gradients and optimisation.
A background in linear algebra and basic calculus is assumed, but no prior deep-learning experience is required.
Author: Angze Li
Status: Actively developed