diff --git a/doc/4_path_tracking/4_2_mpc_controller.md b/doc/4_path_tracking/4_2_mpc_controller.md
new file mode 100644
index 00000000..5f68bca5
--- /dev/null
+++ b/doc/4_path_tracking/4_2_mpc_controller.md
@@ -0,0 +1,587 @@
+# 5. MPC Controller
+
+In this chapter, the MPC (Model Predictive Control) path tracking controller class is implemented. This class implements the MPC path tracking algorithm, which computes a steering angle and acceleration command by solving an **optimization problem** over a finite prediction horizon at every time step.
+
+Before getting into the code, let's get some basic understanding behind the algorithm:
+
+### **Model Predictive Control (MPC)**
+
+- Also known as **Receding Horizon Control**.
+- Widely used in autonomous vehicles, robotics, and industrial process control.
+- At every time step, MPC solves an Optimal Control Problem (OCP) over a future horizon of **N steps**, then applies only the **first control action** and repeats the process at the next step.
+
+**The key idea is**: instead of reacting to the current error (like Stanley or PID), MPC **predicts** where the vehicle will be over the next N steps and finds the control sequence that minimises a cost function while respecting the hard constraints.
+
+---
+
+**Advantages over Stanley / Pure Pursuit / PID :**
+
+- Handles **hard constraints** natively - steering limits, acceleration limits, and speed bounds are enforced by the solver, not violated and then clipped(like in geometric controllers).
+- Plans **N steps ahead** - can anticipate future curves before the vehicle reaches them and adjust the control input according to that.
+- Combines tracking error, control effort, and smoothness of control inputs in one unified cost function.
+- Natively handles variable speed profiles along the course.
+
+**Disadvantages:**
+
+- Computationally heavier : requires solving an NLP (Non-Linear Program) at every step via IPOPT.
+- Requires tuning of multiple cost weights.
+- Sensitive to model mismatch - if the internal vehicle model differs from the real vehicle, tracking performance degrades.
+
+---
+
+#### Libraries used:
+
+1. **CasADi** is a symbolic math library. When you write $p_x + v * cos(\theta) * dt$ in CasADi, it does not compute a number, it stores the formula as an expression tree, like algebra. The reason this matters is that IPOPT needs the derivative of every equation at every iteration. CasADi computes those derivatives analytically and exactly (automatic differentiation), the same way we'd differentiate by hand. Without this, we'd have to approximate derivatives numerically, which is slow and inaccurate. In the MPC code, every `model.set_rhs()` call is for handing CasADi a symbolic expression to work with.
+
+2. **do-mpc** is the MPC framework that sits on top of CasADi. It handles all the boilerplate - assembling the horizon loop, managing the decision variable layout, calling the `tvp_fun()` before each solve to inject the latest reference trajectory, passing everything to *IPOPT*, reading the solution, and shifting the warm start for next time. Without do-mpc we'd write all of that by hand. With do-mpc, `make_step(x0)` does all of it in one call.
+
+---
+
+#### Important terms:
+
+1. **IPOPT** (Interior Point OPTimizer) is the actual numerical solver - the thing that finds the answer. It receives a cost value and its gradient, constraint violations and their Jacobians, from CasADi at each iteration. It then takes a step using the interior point method (imagine a ball rolling down a valley while staying inside a fence), and repeats until the gradient is near zero ($\epsilon$) and all constraints are satisfied. It returns the optimal [steer, accel] sequence.
+
+2. A Nonlinear Program (NLP) is the class of math problem MPC belongs to. It is *"nonlinear"* because the bicycle model contains tan(steer) and cos(yaw) curved functions. That makes it harder than a linear problem (which has exactly one minimum you can jump to directly). NLPs can have multiple local minima, which is why warm-starting from the previous solution matters so much, and why **MPPI** (which samples globally) can sometimes find better solutions than MPC on tricky(back to back tight curves) courses.
+
+---
+
+## 5.1 MpcController Class
+
+The controller class is located at:
+[mpc_controller.py](/src/components/control/mpc/mpc_controller.py)
+
+```python
+"""
+mpc_controller.py
+"""
+
+import math
+import time
+import warnings
+import numpy as np
+import do_mpc
+import casadi as ca
+
+
+class MpcController:
+ """
+ do-mpc-based MPC path-tracking controller.
+ Public interface identical to MppiController.
+ """
+```
+
+This class uses **do-mpc** as the MPC framework and **CasADi** as the symbolic algebra backend. CasADi automatically computes the exact gradients and Hessians that the IPOPT solver needs and so, no finite-difference approximations are required.
+
+---
+
+### 5.1.1 Constructor
+
+```python
+def __init__(self, spec, course=None, color="r",
+ delta_t=0.05, horizon_step_T=15,
+ stage_cost_weight=None,
+ terminal_cost_weight=None,
+ control_cost_weight=None,
+ smoothness_cost_weight=None,
+ max_steer_abs=0.523,
+ max_accel_abs=2.0,
+ v_min=-1.0, v_max=20.0,
+ ipopt_max_iter=80,
+ ipopt_print_level=0,
+ visualize_optimal_traj=True):
+
+ self.WHEEL_BASE_M = spec.wheel_base_m
+ self.course = course
+ self.delta_t = delta_t
+ self.T = horizon_step_T
+
+ self._sw = np.asarray(stage_cost_weight or [50., 50., 1., 20.])
+ self._tw = np.asarray(terminal_cost_weight or self._sw * 2.)
+ self._cw = np.asarray(control_cost_weight or [1.0, 0.5])
+ self._smw = np.asarray(smoothness_cost_weight or [5.0, 2.0])
+
+ self.max_steer = max_steer_abs
+ self.max_accel = max_accel_abs
+ self.v_min = v_min
+ self.v_max = v_max
+
+ self.target_accel_mps2 = 0.0
+ self.target_steer_rad = 0.0
+ self.target_yaw_rate_rps = 0.0
+ self._prev_waypoint_idx = 0
+
+ self._build_arclength_table()
+ self._model = self._build_model()
+ self._mpc = self._build_mpc(self._model)
+```
+
+The constructor takes a `VehicleSpecification` object and an optional `CubicSplineCourse` object. The key member variables are:
+
+| Variable | Default | Description |
+|---|---|---|
+| `WHEEL_BASE_M` | from spec | Distance between front and rear axles [m] |
+| `T` | 15 | Prediction horizon - number of steps looked ahead |
+| `delta_t` | 0.05 | Time step for each horizon step [s] |
+| `_sw` | [50, 50, 1, 20] | Stage cost weights `[w_x, w_y, w_ψ, w_v]` |
+| `_tw` | 2 × `_sw` | Terminal cost weights — heavier than stage |
+| `_cw` | [1.0, 0.5] | Control effort weights `[w_δ, w_a]` |
+| `_smw` | [5.0, 2.0] | Smoothness weights `[w_Δδ, w_Δa]` |
+| `max_steer` | 0.523 rad | Hard steering bound (~30°) |
+| `max_accel` | 2.0 m/s² | Hard acceleration bound |
+| `v_min / v_max` | -1 / 20 m/s | Hard speed bounds |
+| `target_accel_mps2` | 0.0 | Computed acceleration command [m/s²] |
+| `target_steer_rad` | 0.0 | Computed steering angle command [rad] |
+| `target_yaw_rate_rps` | 0.0 | Computed yaw rate command [rad/s] |
+
+---
+
+## 5.2 Algorithm Background
+
+### 5.2.1 State and Control Vectors
+
+MPC operates on a **state vector** $x$ and a **control vector** $u$:
+
+$$
+x_t =
+\begin{bmatrix}
+p_x & p_y & \psi & v
+\end{bmatrix}^{T}
+$$
+
+$$
+u_t =
+\begin{bmatrix}
+\delta & a
+\end{bmatrix}^{T}
+$$
+
+
+
+At each time step, MPC finds the sequence $U = {u_0, u_1, …, u_{N-1}}$ that minimises the total cost $J$ over the horizon.
+
+---
+
+### 5.2.2 Vehicle Model - Discrete Bicycle Kinematics
+
+MPC uses an internal motion model to predict how the vehicle will move in response to each control. The **kinematic bicycle model** is used:
+
+$$
+\begin{aligned}
+p_{x,t+1} &= p_{x,t} + v_t \cos(\psi_t)\ \Delta t \\
+p_{y,t+1} &= p_{y,t} + v_t \sin(\psi_t)\ \Delta t \\
+\psi_{t+1} &= \psi_t + \frac{v_t}{L}\tan(\delta_t)\ \Delta t \\
+v_{t+1} &= v_t + a_t\,\Delta t
+\end{aligned}
+$$
+
+where $L$ is the wheelbase and $Δt$ is the time step. This is the same kinematic relationship used in the other controller also, only the key difference is that MPC applies it **N times symbolically** to predict the full future trajectory.
+
+CasADi writes these equations as symbolic expressions, meaning it can automatically compute exact derivatives, the same way you would differentiate by hand, but done by the computer.
+
+---
+
+### 5.2.3 The Optimization Problem
+
+At every time step, MPC solves the optimization problem:
+
+$$
+\min_{U}\quad J =
+\sum_{t=0}^{N-1}
+\ell(x_t,u_t,u_{t-1})
++
+\phi(x_N)
+$$
+
+subject to
+
+$$
+x_{t+1}=f(x_t,u_t)
+$$
+
+$$
+|\delta_t| \le \delta_{\max}
+$$
+
+$$
+|a_t| \le a_{\max}
+$$
+
+$$
+v_{\min} \le v_t \le v_{\max}
+$$
+
+where:
+
+- $x_{t+1}=f(x_t,u_t)$ enforces the bicycle model dynamics at every prediction step.
+- $|\delta_t| \le \delta_{\max}$ imposes a hard steering constraint.
+- $|a_t| \le a_{\max}$ imposes a hard acceleration constraint.
+- $v_{\min} \le v_t \le v_{\max}$ imposes hard speed limits.
+
+The constraints are **hard constraints**, meaning the optimizer (e.g., IPOPT) enforces them directly during optimization rather than clipping the control output afterward.
+
+---
+
+### 5.2.4 Stage Cost $\ell(x_t, u_t, u_{t-1})$
+
+The stage cost is evaluated at every step $t = 0,\ldots,N-1$ of the horizon. It has three parts:
+
+#### 1. Tracking Error
+
+Penalises deviation from the reference trajectory:
+
+$$
+\ell_{\mathrm{track}} = w_x (p_{x,t}-p_{x,r})^2 + w_y (p_{y,t}-p_{y,r})^2 + w_\psi e_\psi^2 + w_v (v_t-v_r)^2
+$$
+
+where
+
+$$
+e_\psi =
+\mathrm{atan2}\left(\sin(\psi_t-\psi_r),\cos(\psi_t-\psi_r)\right).
+$$
+
+The heading error uses **$\text{atan2}$** rather than a direct subtraction. This maps the error into $(-\pi,\pi]$, and remains smooth when the angle wraps around $\pm\pi$.
+
+---
+
+#### 2. Control Effort
+
+Penalises large control inputs:
+
+$$
+\ell_{\text{effort}} = w_\delta \,\delta_t^2 + w_a\,a_t^2
+$$
+
+---
+
+#### 3. Smoothness
+
+Penalises rapid changes between consecutive control inputs:
+
+$$
+\ell_{\text{smooth}} = w_{\Delta\delta}(\delta_t-\delta_{t-1})^2 + w_{\Delta a} (a_t-a_{t-1})^2
+$$
+
+This is implemented via do-mpc's `set_rterm()` which automatically adds this term at every horizon step. It prevents the controller from producing jerky steering commands.
+
+In code, effort and smoothness are combined into a single `rterm` weight per channel:
+
+```python
+mpc.set_rterm(
+ steer = control_cost_weight[0] + smoothness_cost_weight[0],
+ accel = control_cost_weight[1] + smoothness_cost_weight[1],
+)
+```
+
+---
+
+### 5.2.5 Terminal Cost $\phi(x_N)$
+
+The terminal cost is evaluated only at the **last step** $t = N$ of the horizon:
+
+$$
+\phi(x_N) = W_x (p_{x,N} - p_{x,rN})^2 + W_y (p_{y,N} - p_{y,rN})^2 + W_\psi\left[\mathrm{atan2}\left(\sin(\psi_N-\psi_{rN}),
+\cos(\psi_N-\psi_{rN})
+\right)
+\right]^2
++
+W_v (v_N-v_{rN})^2
+$$
+
+The terminal weights $W$ are set heavier than the stage weights (default: $W = 2 × w$). Without a terminal cost, the optimizer has no incentive to reach a good state at the end of the horizon, it can "borrow" performance from future steps that it cannot see.
+
+---
+
+### 5.2.6 The Receding Horizon Principle
+
+MPC solves for the full sequence $\{u_0, u_1, \ldots, u_{N-1}\}$ but **only applies $u_0$** - the first control action. At the next time step, the horizon shifts forward by one step, the current state is re-measured, and the problem is solved again from scratch. This is why it is called a **receding horizon controller**.
+
+At each control cycle, MPC solves an optimization problem over the prediction horizon, but only the first control action is applied:
+
+$$
+U^{\ast} = u_0^{\ast},\ u_1^{\ast},\ \ldots,\ u_{N-1}^{\ast}
+$$
+
+**but** only $u(k)=u_0^*$ is applied to the vehicle.
+
+At the next sampling instant, the optimization is solved again using the updated state estimate $x(k+1)$ resulting in a new optimal control sequence.
+
+This strategy is known as the **receding horizon** (or **moving horizon**) principle.
+
+**Warm starting**: the previous solution is shifted by one step and used as the initial guess for the next solve. This means *IPOPT* starts close to the true optimum and converges in far fewer iterations.
+
+---
+
+## 5.3 Private Methods
+
+### 5.3.1 `_build_arclength_table()`
+
+```python
+def _build_arclength_table(self):
+ xs, ys, yaws, speeds = [], [], [], []
+ n = 0
+ while True:
+ try:
+ xs.append(self.course.point_x_m(n))
+ ...
+ n += 1
+ except Exception:
+ break
+
+ seg = np.hypot(np.diff(xs), np.diff(ys))
+ arc = np.concatenate([[0.0], np.cumsum(seg)])
+ total = arc[-1]
+
+ self._ext_x = np.tile(xs, 3)
+ self._ext_arc = np.concatenate([arc, arc + total, arc + 2 * total])
+```
+
+This method is called once in `__init__()`. It samples every course point, computes the **cumulative arc-length** (total distance along the curve from the start), and tiles the arrays three times.
+
+The tiling handles closed or looping courses: when a vehicle near the end of the course needs to look $T$ steps ahead, the indices wrap around to the beginning. Without tiling, you would need modular arithmetic at every reference lookup; with tiling, the arrays are simply longer and you index linearly.
+
+---
+
+### 5.3.2 `_build_model()`
+
+```python
+def _build_model(self):
+ model = do_mpc.model.Model("discrete")
+
+ model.set_variable("_x", "px")
+ model.set_variable("_x", "py")
+ model.set_variable("_x", "yaw")
+ model.set_variable("_x", "vel")
+ model.set_variable("_u", "steer")
+ model.set_variable("_u", "accel")
+ model.set_variable("_tvp", "ref_x")
+ model.set_variable("_tvp", "ref_y")
+ model.set_variable("_tvp", "ref_yaw")
+ model.set_variable("_tvp", "ref_vel")
+
+ yaw_err = ca.atan2(
+ ca.sin(model.x["yaw"] - model.tvp["ref_yaw"]),
+ ca.cos(model.x["yaw"] - model.tvp["ref_yaw"]),
+ )
+ model.set_expression("yaw_err", yaw_err)
+
+ model.set_rhs("px", model.x["px"] + model.x["vel"] * ca.cos(model.x["yaw"]) * dt)
+ model.set_rhs("py", model.x["py"] + model.x["vel"] * ca.sin(model.x["yaw"]) * dt)
+ model.set_rhs("yaw", model.x["yaw"] + model.x["vel"] / L * ca.tan(model.u["steer"]) * dt)
+ model.set_rhs("vel", model.x["vel"] + model.u["accel"] * dt)
+
+ model.setup()
+ return model
+```
+
+This method declares the **symbolic bicycle model** using do-mpc's `Model` class. Every variable registered here becomes a CasADi symbolic expression - the same algebra that a mathematician would write, but executable and automatically differentiable.
+
+Three types of symbolic variables are registered:
+
+| Variable type | Names | Description |
+|---|---|---|
+| `_x` (state) | `px, py, yaw, vel` | The four vehicle states |
+| `_u` (control) | `steer, accel` | The two control inputs |
+| `_tvp` (time-varying) | `ref_x, ref_y, ref_yaw, ref_vel` | Reference trajectory changes each step |
+
+The **time-varying parameter (TVP)** is the key do-mpc feature that allows the reference trajectory to be updated every step. do-mpc calls `tvp_fun()` automatically before each solve and fills in the reference values for all `N+1` horizon steps.
+
+The `yaw_err` auxiliary expression is registered separately so it can be referenced cleanly in the cost function. It uses `atan2(sin(·), cos(·))` for the same reason as in the cost formulation above.
+
+---
+
+### 5.3.3 `_build_mpc(model)`
+
+```python
+def _build_mpc(self, model):
+ mpc = do_mpc.controller.MPC(model)
+ mpc.set_param(n_horizon=self.T, t_step=self.delta_t,
+ n_robust=0, store_full_solution=True,
+ nlpsol_opts=self._ipopt_opts)
+
+ # TVP function — closure over self._current_ref
+ tvp_template = mpc.get_tvp_template()
+ def tvp_fun(t_now):
+ for k in range(self.T + 1):
+ tvp_template["_tvp", k, "ref_x"] = float(self._current_ref[0, k])
+ tvp_template["_tvp", k, "ref_y"] = float(self._current_ref[1, k])
+ tvp_template["_tvp", k, "ref_yaw"] = float(self._current_ref[2, k])
+ tvp_template["_tvp", k, "ref_vel"] = float(self._current_ref[3, k])
+ return tvp_template
+ mpc.set_tvp_fun(tvp_fun)
+
+ lterm = (sw[0]*(x["px"]-tvp["ref_x"])**2 + sw[1]*(x["py"]-tvp["ref_y"])**2
+ + sw[2]*model.aux["yaw_err"]**2 + sw[3]*(x["vel"]-tvp["ref_vel"])**2)
+ mterm = (tw[0]*(x["px"]-tvp["ref_x"])**2 + ...)
+
+ mpc.set_objective(lterm=lterm, mterm=mterm)
+ mpc.set_rterm(steer=cw[0]+smw[0], accel=cw[1]+smw[1])
+
+ mpc.bounds["lower", "_u", "steer"] = -self.max_steer
+ mpc.bounds["upper", "_u", "steer"] = self.max_steer
+ mpc.bounds["lower", "_u", "accel"] = -self.max_accel
+ mpc.bounds["upper", "_u", "accel"] = self.max_accel
+ mpc.bounds["lower", "_x", "vel"] = self.v_min
+ mpc.bounds["upper", "_x", "vel"] = self.v_max
+
+ mpc.setup()
+ return mpc
+```
+
+This method assembles the full MPC optimization problem using the do-mpc API:
+
+- `set_param()` - sets the horizon length, time step, and IPOPT options.
+- `set_tvp_fun()` - registers the TVP closure so do-mpc calls it automatically before each solve.
+- `set_objective(lterm, mterm)` - symbolic stage and terminal costs. `lterm` is evaluated at every step $t = 0…N-1$; `mterm` only at step $t = N$.
+- `set_rterm()` - registers the smoothness/effort Δu penalty. do-mpc adds $r_i·(u_{i,t} − u_{i,t−1})²$ at every horizon step automatically.
+- `bounds[]` - hard box constraints on steering, acceleration, and speed.
+
+---
+
+### 5.3.4 `_get_reference_trajectory(x0, y0, current_speed)`
+
+```python
+def _get_reference_trajectory(self, x0, y0, current_speed):
+ v_ref = max(abs(current_speed), 1.0)
+ step_len = v_ref * self.delta_t # arc-length between ref points
+
+ # Forward search: find nearest index ahead of previous position
+ WINDOW = 60
+ best_idx = self._prev_waypoint_idx
+ for off in range(WINDOW):
+ ci = self._prev_waypoint_idx + off
+ dist = (self._ext_x[ci] - x0)**2 + (self._ext_y[ci] - y0)**2
+ if dist < best_dist:
+ best_idx = ci
+ self._prev_waypoint_idx = best_idx
+
+ # Arc-length spaced reference: ref[k] is at distance k × v × Δt ahead
+ s0 = self._ext_arc[best_idx]
+ for k in range(self.T + 1):
+ target_s = s0 + k * step_len
+ ki = argmin |ext_arc[best_idx:] - target_s|
+ ref[:, k] = ext_x[ki], ext_y[ki], ext_yaw[ki], ext_spd[ki]
+ return ref
+```
+
+This method builds the $(4, T+1)$ reference array that is injected into the NLP via the TVP function. It has two important design decisions:
+
+**Forward-only search**: the nearest index is found by searching only *ahead* of `_prev_waypoint_idx`, never behind it. This prevents jumping to the wrong arc on a closed or figure-8 course.
+
+**Arc-length spaced reference**: reference point $k$ is placed at arc-length distance $k × v × Δt$ along the course ahead of the vehicle. This ensures the reference horizon always spans exactly the distance the vehicle will travel in $N$ steps - no more, no less. Without this, the reference would "run ahead" of the vehicle on tight curves and produce a horizon that points in the wrong direction.
+
+The formula for each reference step is:
+
+$$
+s_{\text{target}} = s_0 + k\,v\,\Delta t
+$$
+
+$$
+\mathrm{ref}[k] = \arg\min_{p_i}\left|s_i - s_{\text{target}}\right|
+$$
+
+where $s_0$ is the cumulative arc-length at the vehicle's current nearest point, and $v × Δt$ is the distance travelled per prediction step.
+
+---
+
+## 5.4 Public Methods
+
+### 5.4.1 `update`
+
+```python
+def update(self, state, time_s):
+ if not self.course: return
+
+ px = float(state.get_x_m())
+ py = float(state.get_y_m())
+ yaw = float(state.get_yaw_rad())
+ vel = float(state.get_speed_mps())
+ x0 = np.array([[px], [py], [yaw], [vel]]) # (4, 1) column vector
+
+ self._current_ref = self._get_reference_trajectory(px, py, vel)
+
+ if not self._initialized:
+ self._mpc.x0 = x0
+ self._mpc.set_initial_guess()
+ self._initialized = True
+
+ u0 = self._mpc.make_step(x0) # returns [[steer], [accel]]
+
+ steer0 = float(np.clip(np.asarray(u0).flat[0], -self.max_steer, self.max_steer))
+ accel0 = float(np.clip(np.asarray(u0).flat[1], -self.max_accel, self.max_accel))
+
+ self.target_steer_rad = steer0
+ self.target_accel_mps2 = accel0
+ self.target_yaw_rate_rps = vel / self.WHEEL_BASE_M * tan(steer0)
+```
+
+This is the main entry point called every simulation frame by `FourWheelsVehicle`. It orchestrates all private methods in order:
+
+```
+update()
+ 1. Extract px, py, yaw, vel from state getters
+ 2. Build x0 as (4,1) column vector — do-mpc's required shape
+ 3. Build reference trajectory via _get_reference_trajectory()
+ 4. First call only: set initial guess so IPOPT starts feasibly
+ 5. mpc.make_step(x0) — do-mpc calls tvp_fun(), then solves the NLP
+ 6. Extract steer0, accel0 from solution u0
+ 7. Compute yaw_rate = v / L * tan(steer0) (bicycle model formula)
+ 8. Extract predicted trajectory from opt_x_num for draw()
+```
+
+If no course is set, the method returns early without computing anything.
+
+**A note on $u_0$ extraction**: do-mpc returns $u_0$ as a $(2, 1)$ column vector, but the shape can vary depending on the version. Using `.flat[0]` and `.flat[1]` (rather than `[0,0]` and `[1,0]`) is robust to both `(2,1)` and `(2,)` shapes.
+
+The yaw rate is then derived from the steering angle using the kinematic bicycle model.
+
+$$
+\omega = v * tan(δ) / L
+$$
+
+---
+
+### 5.4.2 Getter Methods
+
+```python
+def get_target_accel_mps2(self):
+ return self.target_accel_mps2
+
+def get_target_yaw_rate_rps(self):
+ return self.target_yaw_rate_rps
+
+def get_target_steer_rad(self):
+ return self.target_steer_rad
+```
+
+These three getter methods expose the computed control outputs. They are called by `FourWheelsVehicle` after each `update()` to apply the commands to the vehicle's state. The interface is identical to `StanleyController`.
+
+---
+
+### 5.4.3 `draw`
+
+```python
+def draw(self, axes, elems):
+ if self.visualize_optimal_traj and self.optimal_trajectory:
+ x_list, y_list = self.optimal_trajectory
+ (line,) = axes.plot(x_list, y_list,
+ color=self.DRAW_COLOR,
+ linewidth=2.0,
+ linestyle="--",
+ alpha=0.9,
+ label="MPC trajectory")
+ elems.append(line)
+```
+
+MPC controller draws the **predicted trajectory** - the optimal state sequence ${x₀*, x₁*, …, x_N*}$ from the last NLP solution. This red dashed line shows exactly where the MPC controller plans the vehicle to go over the next $N × Δt$ seconds. It is a direct visualisation of what the optimizer computed, which makes debugging tuning much easier.
+
+The trajectory is extracted from do-mpc's `opt_x_num` structure by name:
+
+```python
+px_pred = [opt_x_num["_x", k, 0, "px"] for k in range(T + 1)]
+py_pred = [opt_x_num["_x", k, 0, "py"] for k in range(T + 1)]
+```
+
+---
+
+**Author**: Mohit Kumar
diff --git a/doc/4_path_tracking/4_3_mppi_controller.md b/doc/4_path_tracking/4_3_mppi_controller.md
new file mode 100644
index 00000000..9fcbe57c
--- /dev/null
+++ b/doc/4_path_tracking/4_3_mppi_controller.md
@@ -0,0 +1,629 @@
+# 6. MPPI Controller
+
+In this chapter, the MPPI (Model Predictive Path Integral) path tracking controller class is implemented. This class implements the MPPI path tracking algorithm, which computes a steering angle and acceleration command by **sampling thousands of random control sequences**, rolling each one forward through the vehicle dynamics, scoring them by cost, and computing a weighted average update.
+
+Before getting into the code, let's get some basic understanding behind the algorithm:
+
+### **Model Predictive Path Integral (MPPI)**
+
+- Introduced by Williams et al. (2017) at Georgia Tech.
+- Belongs to the family of **stochastic optimal control** methods.
+- Instead of solving an optimization problem algebraically (like MPC), MPPI samples K random perturbations around a nominal control sequence, simulates each one, and combines them using **information-theoretic weighting**.
+
+The key idea is: run $K$ imagined futures in parallel, score each by how well it tracks the path, then take a weighted combination - low-cost futures get high weight, high-cost futures get low weight.
+
+1. **Sampling** - draw K random noise sequences from a Gaussian distribution
+2. **Rollout** - simulate the vehicle forward T steps for each sample
+3. **Weighting** - assign higher weight to samples with lower trajectory cost
+4. **Update** - shift the nominal control sequence toward the best samples
+
+
+
+
+
+Ref: https://dilithjay.com/blog/mppi
+
+**Advantages over MPC:**
+
+- No solver required - the update is a simple weighted sum.
+- Naturally parallelisable over K - runs efficiently on a GPU.
+- Handles **non-convex, non-smooth** cost landscapes where gradient-based solvers struggle.
+- Global exploration through stochastic sampling - can escape local minima.
+
+**Disadvantages:**
+
+- Constraints are **soft only** - satisfied by increasing cost, never hard-blocked (like MPC where hard constraints are enforced).
+- Solution quality depends on K - more samples give a better approximation but cost more compute.
+- Tuning parameters ($\lambda$, $K$, $\sigma$) requires experimentation with no systematic design method.
+- Can be noisy at low K values, producing jittery control.
+
+---
+
+## 6.1 MppiController Class
+
+The controller class is located at:
+[mppi_controller.py](/src/components/control/mppi/mppi_controller.py)
+
+```python
+"""
+mppi_controller.py
+
+Author: Shisato Yano
+"""
+
+import math, sys
+from pathlib import Path
+from math import atan2, cos, sin, tan
+import numpy as np
+
+from state import State
+
+
+class MppiController:
+ """
+ MPPI path-tracking controller aligned with standard formulation:
+ warm start (u_prev), exploitation/exploration sampling, stage + terminal cost,
+ control cost term (param_gamma * u.T @ inv(Sigma) @ v), and optional smoothing.
+ """
+```
+
+This class imports trigonometric functions from Python's `math` module and `numpy` for vectorised sampling and matrix operations. It also imports the `State` class to use its `motion_model` static method for simulating the vehicle forward during each sample rollout.
+
+---
+
+### 6.1.1 `_StateView` Helper Class
+
+```python
+class _StateView:
+ """Minimal state-like object for course methods (x, y, yaw, speed)."""
+
+ def __init__(self, x_m, y_m, yaw_rad, speed_mps):
+ self.x_m = x_m
+ self.y_m = y_m
+ ...
+
+ def get_x_m(self): return self.x_m
+ def get_y_m(self): return self.y_m
+ def get_yaw_rad(self): return self.yaw_rad
+ def get_speed_mps(self): return self.speed_mps
+```
+
+`_StateView` is a lightweight internal adapter class. The course object's `search_nearest_point_index()` method expects an object with four getter methods - `get_x_m()`, `get_y_m()`, `get_yaw_rad()`, and `get_speed_mps()` - matching the interface of the real `State` class.
+
+During rollouts, the simulated position of each sample is a plain numpy array, not a `State` object. `_StateView` wraps any `(x, y, yaw, speed)` tuple into the interface the course expects, without allocating a full `State` object. It is not a public class and is only used internally by `_get_nearest_waypoint()`.
+
+---
+
+### 6.1.2 Constructor
+
+```python
+def __init__(self, spec, course=None, color="g",
+ delta_t=0.05, horizon_step_T=20,
+ number_of_samples_K=256,
+ param_exploration=0.0,
+ param_lambda=50.0, param_alpha=1.0,
+ sigma_steer=0.1, sigma_accel=0.5,
+ max_steer_abs=0.523, max_accel_abs=2.0,
+ stage_cost_weight=None,
+ terminal_cost_weight=None,
+ moving_average_window=0,
+ visualize_optimal_traj=True,
+ visualize_sampled_trajs=True):
+
+ self.WHEEL_BASE_M = spec.wheel_base_m
+ self.T = horizon_step_T
+ self.K = number_of_samples_K
+ self.param_lambda = max(1e-6, param_lambda)
+ self.param_gamma = param_lambda * (1.0 - param_alpha)
+ self.Sigma = np.array([[sigma_steer**2, 0.0],
+ [0.0, sigma_accel**2]])
+ self.stage_cost_weight = np.asarray(stage_cost_weight
+ or [50.0, 50.0, 1.0, 20.0])
+ self.terminal_cost_weight = np.asarray(terminal_cost_weight
+ or self.stage_cost_weight.copy())
+ self.u_prev = np.zeros((self.T, 2))
+```
+
+The constructor takes a `VehicleSpecification` object and an optional `CubicSplineCourse`. The key member variables are:
+
+| Variable | Default | Description |
+|---|---|---|
+| `WHEEL_BASE_M` | from spec | Distance between front and rear axles [m] |
+| `T` | 20 | Prediction horizon - number of steps rolled out per sample |
+| `K` | 256 | Number of sample trajectories drawn each step |
+| `param_lambda` | 50.0 | Temperature $\lambda$ - controls sharpness of weighting |
+| `param_gamma` | $\lambda(1-\alpha)$ | Control cost scaling; default $\alpha=1.0 \Rightarrow \gamma=0$ (cost term off) |
+| `Sigma` | $\text{diag}(\sigma_\delta^2, \sigma_a^2)$ | Noise covariance matrix for sampling |
+| `stage_cost_weight` | [50, 50, 1, 20] | $[w_x, w_y, w_\psi, w_v]$ - stage tracking weights |
+| `terminal_cost_weight` | same as stage | $[w_x, w_y, w_\psi, w_v]$ - terminal tracking weights |
+| `u_prev` | zeros (T, 2) | Warm-start control sequence `[steer, accel]` per step |
+| `target_accel_mps2` | 0.0 | Computed acceleration command [m/s²] |
+| `target_steer_rad` | 0.0 | Computed steering angle command [rad] |
+| `target_yaw_rate_rps` | 0.0 | Computed yaw rate command [rad/s] |
+| `target_speed_mps` | 0.0 | Current speed echoed back (not a planned target - see §6.4.1) |
+
+> **Note on `param_gamma`**: With the default `param_alpha = 1.0`, `param_gamma = 0` and the control cost term in the trajectory cost vanishes entirely. This means the default configuration is pure tracking with no penalty for deviating from the warm-start control. Set `param_alpha < 1.0` only when you want to discourage large perturbations from the previous solution.
+
+---
+
+## 6.2 Algorithm Background
+
+### 6.2.1 State and Control Vectors
+
+MPPI operates on a state vector $x_t$ and control vector $u_t$:
+
+$$
+x_t = \begin{bmatrix} x \\ y \\ \psi \\ v \end{bmatrix}
+\qquad \text{(position [m], heading [rad], speed [m/s])}
+$$
+
+$$
+u_t = \begin{bmatrix} \delta \\ a \end{bmatrix}
+\qquad \text{(steering angle [rad], acceleration [m/s²])}
+$$
+
+The **nominal control sequence** (warm-started from the previous control cycle) is stored as:
+
+$$
+U = \{u_{\text{prev}}[0],\; u_{\text{prev}}[1],\; \ldots,\; u_{\text{prev}}[T-1] \} \quad \text{shape: } (T,\,2)
+$$
+
+---
+
+### 6.2.2 Sampling - Exploitation and Exploration
+
+At each step, K noise sequences are drawn from a zero-mean Gaussian:
+
+$$
+\varepsilon_{k,t} \;\sim\; \mathcal{N}(0, \Sigma) \qquad k = 1\ldots K,\quad t = 0\ldots T-1
+$$
+
+$$
+\Sigma = \mathrm{diag}(\sigma_{\mathrm{steer}}^2,\; \sigma_{\mathrm{accel}}^2)
+\qquad \text{(steer and accel noise are independent)}
+$$
+
+The K samples are split into two groups:
+
+```
+Exploitation samples (first (1 − param_exploration) × K):
+ v_{k,t} = clip( u_prev[t] + ε_{k,t} ) <- perturb warm start
+
+Exploration samples (last param_exploration × K):
+ v_{k,t} = clip( ε_{k,t} ) <- pure random, ignores warm start
+```
+
+The exploitation samples refine the previous solution. The exploration samples venture further afield and can discover better solutions when the warm start has drifted off-course. The `param_exploration` parameter controls the ratio.
+
+> **Important**: `v[k,t]` stores the **clipped** perturbed control used for rollout. The raw unclipped noise `epsilon[k,t]` is stored separately. The weighted update later uses `epsilon`, not `v` - this asymmetry is intentional and explained in section 6.2.5.
+
+---
+
+### 6.2.3 Trajectory Cost S(k)
+
+Each sample trajectory $k$ accumulates cost across all T steps plus a terminal cost:
+
+$$
+S(k) = \sum_{t=0}^{T-1} \left[ c(x_t^k) \;+\; \gamma \cdot u_{\text{prev}}[t]^\top \Sigma^{-1} v_{k,t} \right] \;+\; \phi(x_T^k)
+$$
+
+**Stage tracking cost** $c(x)$ - deviation from the reference path:
+
+$$
+c(x) = w_x(x - x_r)^2 + w_y(y - y_r)^2 + w_\psi\,\mathrm{atan2}(\sin(\psi - \psi_r), \cos(\psi - \psi_r))^2 + w_v(v - v_r)^2
+$$
+
+The heading error uses `atan2(sin(·), cos(·))` - it wraps the angle difference into $(-\pi, \pi]$ and avoids discontinuities near $\pm\pi$.
+
+**Control cost term** $\gamma \cdot u_{\text{prev}}[t]^\top \Sigma^{-1} v_{k,t}$ - penalises samples that deviate far from the warm-start control. The parameter $\gamma = \lambda(1 - \alpha)$ controls its strength:
+
+```
+param_alpha = 1.0 -> γ = 0 (control cost off - pure tracking, default)
+param_alpha = 0.0 -> γ = λ (full control cost - conservative)
+```
+
+**Terminal cost** $\phi(x_T^k)$ - same structure as $c(x)$ but evaluated only at the last rollout step, using `terminal_cost_weight` instead of `stage_cost_weight`.
+
+---
+
+### 6.2.4 Information-Theoretic Weighting
+
+Once all K trajectory costs are computed, the weights are calculated in three steps:
+
+**Step 1** - Subtract the minimum cost for numerical stability (prevents `exp` overflow):
+
+$$\rho = \min_k S(k)$$
+
+**Step 2** - Compute unnormalised softmin weights:
+
+$$
+\tilde{\eta}_k = \exp(-\frac{S(k)-\rho}{\lambda})
+$$
+
+**Step 3** - Normalise so weights sum to 1:
+
+$$\eta = \sum_k \tilde{\eta}_k \qquad w_k = \frac{\tilde{\eta}_k}{\eta}$$
+
+The subtraction of $\rho$ before the exponential is a **numerical stability trick** - not a mathematical change. Without it, $\exp(-S/\lambda)$ would underflow to $0$ for all samples when $S$ values are large, making all weights undefined. Subtracting $\rho$ ensures the best sample always contributes $\exp(0) = 1.0$ to the numerator.
+
+---
+
+### 6.2.5 Control Update - Why `epsilon` Not `v`
+
+The weighted perturbation sum is computed using the **raw unclipped noise** `epsilon`, not the clipped `v`:
+
+$$
+w_\varepsilon[t] = \sum_k w_k \cdot \varepsilon_{k,t} \qquad \text{shape: } (T, 2)
+$$
+
+**Why `epsilon` and not `v`?**
+If the weighted average used the clipped `v[k,t]` instead of the raw `epsilon[k,t]`, the clipping would introduce a systematic bias. Samples that hit the steering or acceleration limit would all contribute the same clipped value regardless of how far outside the limit their original noise was - pulling the update asymmetrically toward the constraint boundary. Using the raw `epsilon` means the weighting reflects the true stochastic intent of each sample without distortion from the clipping operation.
+
+**Optional smoothing** - a moving-average filter can be applied to $w_\varepsilon$ along the time axis:
+
+$$w_\varepsilon \;\leftarrow\; \mathrm{MovingAverage}(w_\varepsilon,\; \text{window})$$
+
+This reduces chatter in the control sequence at the cost of slightly slower response.
+
+**Final update and warm-start shift:**
+
+```
+u = clip( u_prev + w_ε ) <- updated control sequence
+
+apply u[0] → steer0, accel0 <- first action executed this step
+
+warm-start shift for next step:
+ u_prev[0..T-2] ← u[1..T-1] <- shift sequence forward by one
+ u_prev[T-1] ← u[T-1] <- hold last action (not zero)
+```
+
+The last element is **repeated rather than zeroed** because zeroing would force a sudden discontinuity in the control sequence at the horizon boundary - the vehicle would plan to abruptly stop accelerating or steering at step T. Holding the last value is a smoother and more conservative assumption about what happens beyond the horizon.
+
+---
+
+## 6.3 Private Methods
+
+### 6.3.1 `_get_nearest_waypoint(x, y, update_prev_idx)`
+
+```python
+def _get_nearest_waypoint(self, x, y, update_prev_idx=False):
+ view = _StateView(x, y, 0.0, 0.0)
+ nearest_idx = self.course.search_nearest_point_index(view)
+ ref_x = self.course.point_x_m(nearest_idx)
+ ref_y = self.course.point_y_m(nearest_idx)
+ ref_yaw = self.course.point_yaw_rad(nearest_idx)
+ ref_v = self.course.point_speed_mps(nearest_idx)
+ if update_prev_idx:
+ self.prev_waypoints_idx = nearest_idx
+ return ref_x, ref_y, ref_yaw, ref_v
+```
+
+This method queries the course for the nearest point to `(x, y)` using a global nearest-neighbour search over all course points. The `_StateView(x, y, 0.0, 0.0)` wrapper is required because `search_nearest_point_index()` expects an object with getter methods, not a plain tuple.
+
+It is called in two distinct contexts:
+
+- **Once per `update()` step** on the vehicle's actual position with `update_prev_idx=True` - anchors the stored index.
+- **K × T times during rollouts** on the simulated position of each sample - uses a fresh global search each time.
+
+> **Computational note**: Each call to `_get_nearest_waypoint()` performs a full O(N) scan over all N course points. With K=256 samples and T=20 steps, this method is called **5120 times per `update()` step**. On a course with N=500 points, that is 2.56 million distance comparisons per control cycle. This is the primary reason MPPI is slower than expected on CPU and why GPU batching (vectorising the rollout across K) reduces compute from O(K×T×N) sequential calls to a single matrix operation.
+
+---
+
+### 6.3.2 `_g(v)` - Control Clipping
+
+```python
+def _g(self, v):
+ steer = np.clip(v[0], -self.max_steer_abs, self.max_steer_abs)
+ accel = np.clip(v[1], -self.max_accel_abs, self.max_accel_abs)
+ return np.array([steer, accel])
+```
+
+After adding noise to the warm-start control, `_g()` clips the result to the physical limits of the vehicle. This is MPPI's only mechanism for enforcing control bounds - there are no hard constraints as in MPC. Samples that would require $|\delta| > \delta_{\max}$ are simply clipped to the limit before being rolled out.
+
+The clipped result is stored in `v[k,t]`. Critically, the raw unclipped noise `epsilon[k,t]` is preserved separately and is used in the weighted update - see section 6.2.6 for why this separation matters.
+
+---
+
+### 6.3.3 `_F(x_t, v_t)` - One-Step Dynamics Rollout
+
+```python
+def _F(self, x_t, v_t):
+ x, y, yaw, v = x_t[0,0], x_t[1,0], x_t[2,0], x_t[3,0]
+ steer, accel = float(v_t[0]), float(v_t[1])
+ if abs(v) < 1e-9:
+ yaw_rate = 0.0
+ else:
+ yaw_rate = v / self.WHEEL_BASE_M * tan(steer)
+ state_vec = np.array([[x], [y], [yaw], [v]])
+ input_vec = np.array([[accel], [yaw_rate]])
+ return State.motion_model(state_vec, input_vec, self.delta_t)
+```
+
+This method advances the vehicle state by one time step $\Delta t$ using the kinematic bicycle model. It exists as a bridge layer because `State.motion_model` takes `(accel, yaw_rate)` as its input, but MPPI works in `(steer, accel)` space. `_F` performs the conversion:
+
+$$\dot{\psi} = \frac{v}{L} \tan(\delta)$$
+
+> A guard against near-zero speed ($|v| < 10^{-9}$) prevents division by zero.
+
+This function is called $K \times T$ times per `update()` step - once for each sample at each horizon step - making it the **computational hotspot** of the algorithm alongside `_c()`.
+
+---
+
+### 6.3.4 `_c(x_t)` - Stage Cost
+
+```python
+def _c(self, x_t):
+ x, y, yaw, v = float(x_t[0,0]), float(x_t[1,0]), float(x_t[2,0]), float(x_t[3,0])
+ yaw = (yaw + 2.0 * np.pi) % (2.0 * np.pi)
+ ref_x, ref_y, ref_yaw, ref_v = self._get_nearest_waypoint(x, y)
+ ref_yaw = (ref_yaw + 2.0 * np.pi) % (2.0 * np.pi)
+ yaw_diff = np.arctan2(np.sin(yaw - ref_yaw), np.cos(yaw - ref_yaw))
+ cost = ( stage_cost_weight[0] * (x - ref_x)**2
+ + stage_cost_weight[1] * (y - ref_y)**2
+ + stage_cost_weight[2] * yaw_diff**2
+ + stage_cost_weight[3] * (v - ref_v)**2 )
+ return cost
+```
+
+The stage cost measures how far the **simulated** state $x_t^k$ deviates from the nearest reference point on the course. It is evaluated at every step of every rollout.
+
+**Two-step yaw normalisation**: the yaw is first mapped to $[0, 2\pi)$ using `% (2π)`, and then `atan2(sin(·), cos(·))` maps the difference back to $(-\pi, \pi]$. The two steps serve different purposes:
+
+1. `% (2π)` - makes both `yaw` and `ref_yaw` positive before subtraction, preventing a sign mismatch from wrap-around (e.g. vehicle at 350° and reference at 10° giving a −340° difference instead of +20°).
+2. `atan2(sin(·), cos(·))` - maps the already-consistent difference to the shortest angular path in $(-\pi, \pi]$.
+
+Without the first step, vehicles near $\pm\pi$ can receive arbitrarily large heading errors from a simple sign difference.
+
+---
+
+### 6.3.5 `_phi(x_T)` - Terminal Cost
+
+```python
+def _phi(self, x_T):
+ # Same structure as _c(), uses terminal_cost_weight
+ cost = ( terminal_cost_weight[0] * (x - ref_x)**2
+ + terminal_cost_weight[1] * (y - ref_y)**2
+ + terminal_cost_weight[2] * yaw_diff**2
+ + terminal_cost_weight[3] * (v - ref_v)**2 )
+ return cost
+```
+
+The terminal cost is evaluated only at the **last state** $x_T^k$ of each sample rollout. It uses `terminal_cost_weight` instead of `stage_cost_weight`. By default the weights are the same, but setting the terminal weights higher encourages samples to end up in a good state at the end of the horizon - analogous to the terminal cost in MPC.
+
+---
+
+### 6.3.6 `_calc_epsilon()`
+
+```python
+def _calc_epsilon(self):
+ mu = np.zeros(2)
+ epsilon = np.random.multivariate_normal(mu, self.Sigma, (self.K, self.T))
+ return epsilon
+```
+
+Samples the entire noise matrix $\varepsilon \in \mathbb{R}^{K \times T \times 2}$ in a single vectorised call. The shape $(K, T, 2)$ means: K samples, each of T steps, each with 2-dimensional noise $[\varepsilon_{\mathrm{steer}},\, \varepsilon_{\mathrm{accel}}]$.
+
+The covariance matrix is:
+
+$$
+\Sigma = \mathrm{diag}(\sigma_{\mathrm{steer}}^2,\; \sigma_{\mathrm{accel}}^2) = \begin{bmatrix} \sigma_{\mathrm{steer}}^2 & 0 \\ 0 & \sigma_{\mathrm{accel}}^2 \end{bmatrix}
+$$
+
+The off-diagonal zeros mean steering and acceleration noise are drawn independently. A single `multivariate_normal` call is used rather than two separate `normal` calls to stay consistent with the MPPI formulation, which always refers to a single noise vector $\varepsilon_t \in \mathbb{R}^2$.
+
+---
+
+### 6.3.7 `_compute_weights(S)`
+
+```python
+def _compute_weights(self, S):
+ rho = S.min()
+ eta = np.sum(np.exp((-1.0 / self.param_lambda) * (S - rho)))
+ w = (1.0 / eta) * np.exp((-1.0 / self.param_lambda) * (S - rho))
+ return w
+```
+
+Implements the information-theoretic weighting formula in three lines. The subtraction of $\rho = \min(S)$ before the exponential is a **numerical stability trick**, not a mathematical change. Without it, $\exp(-S/\lambda)$ would underflow to 0 for all samples when $S$ values are large, making all weights undefined. Subtracting $\rho$ ensures the best sample always contributes $\exp(0) = 1.0$ to the numerator.
+
+The output $w$ is a $(K,)$ array that sums to 1.0, acting as a proper probability distribution over the K samples. These weights are also stored as `self.weights = w.tolist()` for use by `draw()` to scale each sampled trajectory's transparency.
+
+---
+
+### 6.3.8 `_moving_average_filter(xx, window_size)`
+
+```python
+def _moving_average_filter(self, xx, window_size):
+ b = np.ones(window_size) / window_size
+ xx_mean = np.zeros_like(xx)
+ for d in range(xx.shape[1]):
+ xx_mean[:, d] = np.convolve(xx[:, d], b, mode="same")
+ n_conv = math.ceil(window_size / 2)
+ xx_mean[0, d] *= window_size / n_conv
+ for i in range(1, n_conv):
+ xx_mean[i, d] *= window_size / (i + n_conv)
+ xx_mean[-i, d] *= window_size / (i + n_conv - (window_size % 2))
+ return xx_mean
+```
+
+Applies a moving-average filter to each column of the $(T, 2)$ weighted perturbation array $w_\varepsilon$. The filter smooths the control sequence along the time dimension, reducing high-frequency chatter.
+
+**Edge correction**: `numpy`'s `mode="same"` pads the signal with zeros at the boundaries. For a window of size $W$, the first element is computed from only $\lceil W/2 \rceil$ real values instead of $W$, so `numpy` effectively divides by $W$ when only $\lceil W/2 \rceil$ values exist under-weighting the edges. The correction loop rescales each boundary element back up by $W / actual\_count$. The `window_size % 2` term handles the asymmetry between even and odd window sizes, where the left and right boundary element counts differ by one.
+
+---
+
+## 6.4 Public Methods
+
+### 6.4.1 `update`
+
+```python
+def update(self, state, time_s):
+ if not self.course:
+ self.target_accel_mps2 = 0.0
+ self.target_yaw_rate_rps = 0.0
+ self.target_steer_rad = 0.0
+ self.target_speed_mps = state.get_speed_mps()
+ self.optimal_trajectory = None
+ self.sampled_trajectories = []
+ self.weights = []
+ return
+```
+
+If no course is set, the method returns early and resets all outputs to safe defaults: acceleration and steering to zero, trajectories to empty lists, and weights to an empty list. Callers that inspect `self.weights` or `self.sampled_trajectories` after a no-course call will receive empty containers, not stale values from the previous step.
+
+```python
+ x0 = np.array([[state.get_x_m()],
+ [state.get_y_m()],
+ [state.get_yaw_rad()],
+ [state.get_speed_mps()]])
+
+ self._get_nearest_waypoint(x0[0,0], x0[1,0], update_prev_idx=True)
+ u = self.u_prev.copy()
+ epsilon = self._calc_epsilon()
+
+ n_exploit = int((1.0 - self.param_exploration) * self.K)
+ for k in range(self.K):
+ for t in range(self.T):
+ v[k,t] = self._g(u[t] + epsilon[k,t]) if k < n_exploit \
+ else self._g(epsilon[k,t])
+
+ Sigma_inv = np.linalg.inv(self.Sigma)
+ for k in range(self.K):
+ x = x0.copy()
+ for t in range(self.T):
+ S[k] += self._c(x) + self.param_gamma * (u[t].T @ Sigma_inv @ v[k,t])
+ x = self._F(x, v[k,t])
+ S[k] += self._phi(x)
+
+ w = self._compute_weights(S)
+ self.weights = w.tolist()
+
+ w_epsilon = np.zeros((self.T, 2))
+ for t in range(self.T):
+ for k in range(self.K):
+ w_epsilon[t] += w[k] * epsilon[k, t] # epsilon, not v
+
+ if self.moving_average_window >= 2:
+ w_epsilon = self._moving_average_filter(w_epsilon, self.moving_average_window)
+
+ u = np.clip(u + w_epsilon,
+ [-self.max_steer_abs, -self.max_accel_abs],
+ [ self.max_steer_abs, self.max_accel_abs])
+
+ self.target_steer_rad = float(u[0, 0])
+ self.target_accel_mps2 = float(u[0, 1])
+ v0 = state.get_speed_mps()
+ self.target_yaw_rate_rps = 0.0 if abs(v0) < 1e-9 \
+ else v0 / self.WHEEL_BASE_M * tan(self.target_steer_rad)
+ self.target_speed_mps = v0 # echoes current speed, not a planned target
+
+ self.u_prev[:-1] = u[1:]
+ self.u_prev[-1] = u[-1]
+```
+
+> **`target_speed_mps` note**: Unlike Stanley which runs a proportional speed controller (`Kp × speed_error`), MPPI does not plan a separate speed command. `target_speed_mps` is set to the current measured speed `v0`. Speed tracking is handled entirely through the cost function's $w_v$ weight - samples that deviate from the reference speed receive higher cost and lower weight, indirectly shaping the acceleration commands produced.
+
+This is the main entry point called every simulation frame by `FourWheelsVehicle`. It orchestrates all private methods in order:
+
+```
+update()
+ 1. If no course: zero all outputs, clear trajectories and weights, return
+ 2. Build x0 (4×1) from state getters
+ 3. Anchor nearest waypoint index (update_prev_idx=True)
+ 4. Copy u_prev as warm-start nominal sequence u
+ 5. Sample ε ∈ ℝ^{K×T×2} via _calc_epsilon()
+ 6. Build v_{k,t} - exploitation or exploration + clip via _g()
+ (v is clipped; raw epsilon is preserved separately)
+ 7. Compute Sigma_inv = inv(Sigma)
+ 8. For each sample k:
+ For each step t:
+ S[k] += _c(x) + γ · u[t]ᵀ Σ⁻¹ v[k,t]
+ x = _F(x, v[k,t])
+ S[k] += _phi(x)
+ 9. w = _compute_weights(S)
+ 10. self.weights = w.tolist() <- stored for draw()
+ 11. w_ε[t] = Σ_k w[k] · ε[k,t] <- raw epsilon, not v
+ 12. Optional: _moving_average_filter(w_ε)
+ 13. u = clip(u_prev + w_ε)
+ 14. target_steer, target_accel <- u[0]
+ 15. target_yaw_rate = v / L · tan(steer)
+ 16. target_speed_mps <- current speed (echoed, not planned)
+ 17. Rollout optimal trajectory using updated u -> self.optimal_trajectory
+ 18. Rollout each sample k using v[k] -> self.sampled_trajectories
+ 19. Shift warm start: u_prev[0..T-2] ← u[1..T-1], u_prev[T-1] <- u[T-1]
+```
+
+**Steps 17 and 18 produce different trajectories**: the optimal trajectory uses the updated `u` (after the weighted update), while the sampled trajectories use `v[k]` (the original perturbed controls before the update). They represent different things - the optimal trajectory is what the controller commits to; the sampled trajectories are the K explored futures used to compute it.
+
+---
+
+### 6.4.2 Getter Methods
+
+```python
+def get_target_accel_mps2(self):
+ return self.target_accel_mps2
+
+def get_target_yaw_rate_rps(self):
+ return self.target_yaw_rate_rps
+
+def get_target_steer_rad(self):
+ return self.target_steer_rad
+```
+
+These three getter methods expose the computed control outputs. They are called by `FourWheelsVehicle` after each `update()` to apply the commands to the vehicle's state.
+
+---
+
+### 6.4.3 `draw`
+
+```python
+def draw(self, axes, elems):
+ if self.visualize_sampled_trajs and self.sampled_trajectories:
+ for (x_list, y_list), w in zip(self.sampled_trajectories, self.weights):
+ alpha = 0.06 + 0.12 * min(1.0, float(w) * self.K)
+ (line,) = axes.plot(x_list, y_list, "b-", linewidth=0.35, alpha=alpha)
+ elems.append(line)
+
+ if self.visualize_optimal_traj and self.optimal_trajectory:
+ x_list, y_list = self.optimal_trajectory
+ (line,) = axes.plot(x_list, y_list, color=self.DRAW_COLOR,
+ linewidth=2.0, alpha=0.9, label="MPPI trajectory")
+ elems.append(line)
+```
+
+MPPI draws **two layers of visual information**:
+
+1. **Sampled trajectories** - all $K$ rollouts drawn as thin blue lines. Each line's transparency $\alpha$ is scaled by the sample's weight:
+
+$$
+\alpha = 0.06 + 0.12 \cdot \min(1.0,\; w_k \cdot K)
+$$
+
+A sample with average weight ($w_k = 1/K$) gets $\alpha \approx 0.18$. A sample with $5\times$ average weight gets $\alpha = 0.66$. This makes the most-influential samples visually prominent - you can literally see which trajectories the controller is paying attention to(getting more weights).
+
+The `self.weights` consumed here are set inside `update()` as `self.weights = w.tolist()` immediately after `_compute_weights()`. They are stored as an instance attribute rather than passed as a return value so `draw()` can access them without changing the `update()` call signature.
+
+2. **Optimal trajectory** - a single solid line in the constructor colour (default green) showing the updated control sequence `u` rolled out from the current state. This is distinct from the sampled trajectories - it represents the weighted-average result after all K samples are combined, not any individual sample.
+
+---
+
+## 6.5 Comparison with MPC
+
+| Aspect | MPC | MPPI |
+|---|---|---|
+| Optimization method | Deterministic nonlinear optimization (NLP) | Sampling-based stochastic optimization |
+| Error used | Heading + position + speed | Heading + position + speed |
+| Prediction horizon | N-step deterministic trajectory | T-step horizon with K sampled rollouts |
+| Constraints | Hard - enforced by IPOPT solver | Soft - clipping + cost penalty only |
+| Solver | IPOPT / interior point method | None - weighted rollout averaging |
+| Local minima | Can get trapped | Better exploration via stochastic sampling |
+| Tuning parameters | Cost weights + horizon length | $\lambda$, $K$, $\sigma$, $\alpha$, horizon length |
+| Dynamics requirement | Requires differentiable model (CasADi) | Arbitrary black-box dynamics via `_F()` |
+| Parallelisation | Limited | Highly parallelisable on GPUs - K rollouts are independent |
+| Robustness to model errors | Sensitive to model mismatch | More robust due to sampling-based exploration |
+| Real-time performance | Depends on solver convergence | Fixed budget - determined by K and T |
+| Speed control | Cost weight $w_v$ + horizon planning | Cost weight $w_v$ only; `target_speed_mps` echoes current speed |
+| Constraint mechanism | Box bounds passed to solver | `_g()` clips controls; raw `epsilon` used in update |
+
+---
+
+**Author**: Mohit Kumar
\ No newline at end of file
diff --git a/doc/4_path_tracking/mppi.png b/doc/4_path_tracking/mppi.png
new file mode 100644
index 00000000..1389446f
Binary files /dev/null and b/doc/4_path_tracking/mppi.png differ
diff --git a/src/components/control/mppi/mppi_controller.py b/src/components/control/mppi/mppi_controller.py
index 2d5963a8..0fd1857b 100644
--- a/src/components/control/mppi/mppi_controller.py
+++ b/src/components/control/mppi/mppi_controller.py
@@ -112,6 +112,7 @@ def __init__(
self.visualize_sampled_trajs = visualize_sampled_trajs
self.Sigma = np.array([[sigma_steer**2, 0.0], [0.0, sigma_accel**2]])
+ self.Sigma_inv = np.linalg.inv(self.Sigma) # precompute inverse for efficiency
self.stage_cost_weight = np.asarray(
stage_cost_weight
if stage_cost_weight is not None
@@ -272,13 +273,12 @@ def update(self, state, time_s):
v[k, t] = self._g(v[k, t])
S = np.zeros(self.K)
- Sigma_inv = np.linalg.inv(self.Sigma)
for k in range(self.K):
x = x0.copy()
for t in range(self.T):
u_t = u[t]
v_t = v[k, t]
- S[k] += self._c(x) + self.param_gamma * (u_t.T @ Sigma_inv @ v_t)
+ S[k] += self._c(x) + self.param_gamma * (u_t.T @ self.Sigma_inv @ v_t)
x = self._F(x, v_t)
S[k] += self._phi(x)
diff --git a/src/simulations/path_tracking/mpc_path_tracking/mpc_path_tracking.py b/src/simulations/path_tracking/mpc_path_tracking/mpc_path_tracking.py
index 3142f3e2..d96f2878 100644
--- a/src/simulations/path_tracking/mpc_path_tracking/mpc_path_tracking.py
+++ b/src/simulations/path_tracking/mpc_path_tracking/mpc_path_tracking.py
@@ -12,6 +12,9 @@
import sys
from pathlib import Path
+import warnings
+# to supress do-mpc warnings about the full version
+warnings.filterwarnings("ignore", category=UserWarning)
# ── Path setup (identical pattern to mppi_path_tracking.py) ───────────────
abs_dir_path = str(Path(__file__).absolute().parent)
@@ -29,7 +32,7 @@
from state import State
from four_wheels_vehicle import FourWheelsVehicle
from cubic_spline_course import CubicSplineCourse
-from mpc_controller import MPCController # ← do-mpc controller
+from mpc_controller import MPCController # do-mpc controller
show_plot = True
@@ -46,7 +49,7 @@ def main():
)
# ── Reference course ─────────────────────────────────────────────────
- # Identical waypoints to mppi_path_tracking.py so the comparison is fair.
+ # Identical waypoints to mppi_path_tracking.py for comparison.
course = CubicSplineCourse(
[0.0, 10.0, 25, 40, 50],
[0.0, 4.0, -12, 20, -13],
@@ -59,7 +62,7 @@ def main():
state = State(color=spec.color)
# ── MPC controller ───────────────────────────────────────────────────
- # Parameters are chosen to be comparable to the MPPI configuration:
+ # Parameters are chosen to be comparable to the MPPI configuration:
# delta_t and horizon give a similar prediction window (≈ 2 s).
# Stage / terminal cost weights mirror the MPPI weights exactly so
# both controllers optimise the same objective shape.