Bài giảng

Description of course 1 - Statistical classification

Statistical classification is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (other clusters). In this lecture, we introduce the notion of classification and study several methods in machine learning proposed both in supervised and unsupervised settings.

  • Day 1 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)
  1. Introduction to classification
  2. Supervised classification: k-nearest neighbors (KNN)
  3. Supervised classification: discriminant analysis (linear and quadratic)
  4. Other procedures
  • Day 2 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)
  1. Unsupervised clustering: k-means
  2. Unsupervised clustering: hierarchical ascending classification 3. Other procedures

Description of course 2 - Computational statistics

This lecture is dedicated to computational statistics, the study which is the intersection of statistics and computer science and refers to the statistical methods that are enabled by using computational methods.

  • Day 1 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session) Random variables simulation
  1. Monte-Carlo Markov Chain
  2. Hasting-Metropolis algorithm
  3. Other procedures (MALA...)
  • Day 2 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session) Introduction to stochastic optimization
  1. Stochastic gradient
  2. Expectation-Minimization algorithm
  3. Other procedures
  4. Application to mixture models (links with classification via the Gaussian mixture models)

Description of course 3 - Machine Learning theory, methods and applications

Machine learning methods are today widely used in many applications as web page ranking, emails spam detection, energy models and forecasts...Over the past two decades, machine learning has become a key player for smart data analysis. The purpose of this course is to provide an overview of the principal methods of machine learning to implement predictive models for a wide range of applications. The successive lessons present the theoretical settings of machine learning in the regression and in the classification framework and the implementation of these methods on real applications using Python or R software.

  • Day 1 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)
  1. Linear regression models
  2. Penalized regression models (lasso, ridge)
  3. Predictive models
  • Day 2 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)
  1. Regression trees
  2. Bagging
  3. Random forest
  4. Xtrees
  • Day 3 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)
  1. Transfer learning
  2. Physics-informed neural networks

Description of course 4 - Random matrix tools for Machine Learning

One important task in machine learning (ML) is to estimate covariance matrices given some dataset with a known distribution. In many applications, the dimension of the observations is as large as the number of samples, and standard ML algorithms tend to misbehave when dealing with large datasets. Recently, random matrix theory (RMT) has emerged as a tool for understanding statistical properties of such high dimensional datasets.

RMT originated with Wishart’s work in the 1930s in the context of statistics to analyze large datasets. Nowadays, it holds significant relevance in fields like statistical learning and data science. This lecture will provide an introduction to basic results in RMT, including Wigner’s theorem and the Marchenko-Pastur theorem. We will cover essential techniques used in RMT, such as the resolvent method, the Stieltjes transform, and the moment method, with a focus on their applications to high-dimensional data analysis.

  • Day 1 - Morning (3 hours, lecture course) + Afternoon (3 hours, lecture course)
  1. Introduction: large dimensional data analysis
  2. Wigner’s theorem via the method of moments and the Stieltjes transform
  • Day 2 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)
  1. Marchenko-Pastur’s theore
  2. Application to machine learning, estimation of large covariance matrices