Computational Statistics

Labs from the Master MVA Computational Statistics class, taught by Prof. Stéphanie Allassonnière. Each lab assignment explores a core topic in statistical estimation and bayesian inference, combining theoretical derivations with practical Python simulations.

TP1. Estimators and SGD Algorithm

In a first exercise, we compare two estimators for the uniform model $\mathcal{U}([0,\theta])$: the method-of-moments estimator and the maximum likelihood estimator (MLE). The MLE is shown to have a stricyly lower quadratic risk for $n\ge 2$.

In the second part, we implement the Stochastic Gradient Descent (SGD) algorithm from scratch to learn a linear classifier, then studies how observation noise degrades estimation quality.

Finally, the method is applied to the UCI Heart Disease dataset, reaching >70% accuracy.

TP2. EM Algorithm for GMMs

This second practical work is centered on parameter estimation of a Gaussian Mixture Model (GMM) using the Expectation-maximization (EM) algorithm. We first implement a way to sample from a GMM, and then implement the EM algorithm in this particular setting.

TP3. Hasting-Metropolis and Gibbs Samplers

In the first exercise, we study a hierarchical population model for longitudinal data, such as disease progression measurements, and we try to estimate the model's parameters. Because direct sampling from the posterior is not possible, we implement the Stochastic Approximation EM (SAEM) algorithm using a Metropolis-Hastings (MH) sampler for the latent variables.

The second exercise explores Data Augmentation, using Markov chain Monte Carlo (MCMC). We construct a bivariate Markov chain and use a Gibbs sampler to approximate a specific density.

TP4. Improving the Metropolis-Hastings Algorithm

In this final practical work, we explore advanced techniques to overcome common limitations of the standard MH algorithm. First, we tackle the difficulty of tuning the proposal distribution's parameters by implementing an Adaptive MH within Gibbs sampler. This algorithm automatically adjusts the variances of the proposal distributions on the fly to target an optimal acceptance rate.

Next, we address the challenge of sampling from highly multimodal distributions, where standard MCMC often gets stuck in a single local mode. We demonstrate this failure on a toy target distribution defined as a mixture of 20 well-separated Gaussians To solve this, we implement Parallel Tempering, which runs multiple Markov chains in parallel at varying "temperatures". This allows to improve exploration, and then to correctly sample from the target distribution.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
TP1.ipynb		TP1.ipynb
TP2.ipynb		TP2.ipynb
TP3.ipynb		TP3.ipynb
TP4.ipynb		TP4.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computational Statistics

TP1. Estimators and SGD Algorithm

TP2. EM Algorithm for GMMs

TP3. Hasting-Metropolis and Gibbs Samplers

TP4. Improving the Metropolis-Hastings Algorithm

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Computational Statistics

TP1. Estimators and SGD Algorithm

TP2. EM Algorithm for GMMs

TP3. Hasting-Metropolis and Gibbs Samplers

TP4. Improving the Metropolis-Hastings Algorithm

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages