Labs from the Master MVA Computational Statistics class, taught by Prof. Stéphanie Allassonnière. Each lab assignment explores a core topic in statistical estimation and bayesian inference, combining theoretical derivations with practical Python simulations.
In a first exercise, we compare two estimators for the uniform model
In the second part, we implement the Stochastic Gradient Descent (SGD) algorithm from scratch to learn a linear classifier, then studies how observation noise degrades estimation quality.
Finally, the method is applied to the UCI Heart Disease dataset, reaching >70% accuracy.
This second practical work is centered on parameter estimation of a Gaussian Mixture Model (GMM) using the Expectation-maximization (EM) algorithm. We first implement a way to sample from a GMM, and then implement the EM algorithm in this particular setting.
In the first exercise, we study a hierarchical population model for longitudinal data, such as disease progression measurements, and we try to estimate the model's parameters. Because direct sampling from the posterior is not possible, we implement the Stochastic Approximation EM (SAEM) algorithm using a Metropolis-Hastings (MH) sampler for the latent variables.
The second exercise explores Data Augmentation, using Markov chain Monte Carlo (MCMC). We construct a bivariate Markov chain and use a Gibbs sampler to approximate a specific density.
In this final practical work, we explore advanced techniques to overcome common limitations of the standard MH algorithm. First, we tackle the difficulty of tuning the proposal distribution's parameters by implementing an Adaptive MH within Gibbs sampler. This algorithm automatically adjusts the variances of the proposal distributions on the fly to target an optimal acceptance rate.
Next, we address the challenge of sampling from highly multimodal distributions, where standard MCMC often gets stuck in a single local mode. We demonstrate this failure on a toy target distribution defined as a mixture of 20 well-separated Gaussians To solve this, we implement Parallel Tempering, which runs multiple Markov chains in parallel at varying "temperatures". This allows to improve exploration, and then to correctly sample from the target distribution.




