Trường Xuân về Thống kê và Học máy-Viện nghiên cứu cao cấp về toán

Trường Xuân về Thống kê và Học máy

Bài giảng

Description of course 1 - Statistical classification

Statistical classification is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (other clusters). In this lecture, we introduce the notion of classification and study several methods in machine learning proposed both in supervised and unsupervised settings.

Day 1 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)

Introduction to classification
Supervised classification: k-nearest neighbors (KNN)
Supervised classification: discriminant analysis (linear and quadratic)
Other procedures

Day 2 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)

Unsupervised clustering: k-means
Unsupervised clustering: hierarchical ascending classification 3. Other procedures

Description of course 2 - Computational statistics

This lecture is dedicated to computational statistics, the study which is the intersection of statistics and computer science and refers to the statistical methods that are enabled by using computational methods.

Day 1 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session) Random variables simulation

Monte-Carlo Markov Chain
Hasting-Metropolis algorithm
Other procedures (MALA...)

Day 2 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session) Introduction to stochastic optimization

Stochastic gradient
Expectation-Minimization algorithm
Other procedures
Application to mixture models (links with classification via the Gaussian mixture models)

Description of course 3 - Machine Learning theory, methods and applications

Machine learning methods are today widely used in many applications as web page ranking, emails spam detection, energy models and forecasts...Over the past two decades, machine learning has become a key player for smart data analysis. The purpose of this course is to provide an overview of the principal methods of machine learning to implement predictive models for a wide range of applications. The successive lessons present the theoretical settings of machine learning in the regression and in the classification framework and the implementation of these methods on real applications using Python or R software.

Day 1 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)

Linear regression models
Penalized regression models (lasso, ridge)
Predictive models

Day 2 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)

Regression trees
Bagging
Random forest
Xtrees

Day 3 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)

Transfer learning
Physics-informed neural networks

Description of course 4 - Random matrix tools for Machine Learning

One important task in machine learning (ML) is to estimate covariance matrices given some dataset with a known distribution. In many applications, the dimension of the observations is as large as the number of samples, and standard ML algorithms tend to misbehave when dealing with large datasets. Recently, random matrix theory (RMT) has emerged as a tool for understanding statistical properties of such high dimensional datasets.

RMT originated with Wishart’s work in the 1930s in the context of statistics to analyze large datasets. Nowadays, it holds significant relevance in fields like statistical learning and data science. This lecture will provide an introduction to basic results in RMT, including Wigner’s theorem and the Marchenko-Pastur theorem. We will cover essential techniques used in RMT, such as the resolvent method, the Stieltjes transform, and the moment method, with a focus on their applications to high-dimensional data analysis.

Day 1 - Morning (3 hours, lecture course) + Afternoon (3 hours, lecture course)

Introduction: large dimensional data analysis
Wigner’s theorem via the method of moments and the Stieltjes transform

Day 2 - Morning (3 hours, lecture course) + Afternoon (3 hours, practical session)

Marchenko-Pastur’s theore
Application to machine learning, estimation of large covariance matrices