Mini-course: Foundation of Mixture of Experts in Statistical Machine Learning

Time:

Venue/Location: VIASM, 161 Huynh Thuc Khang Street, Lang Ward, Hanoi

Objective:

The overarching goal is to address the unsolved statistical behaviors of parameters and experts in MoE models by developing novel theoretical tools and applying them to analyze state-of-the-art MoE architectures.

Part 1: Developing New Theoretical Frameworks and Estimation Tools.

Part 2: Theoretical Analysis of DeepSeekMoE Architecture.

Lecturer:

Prof. Ho Pham Minh Nhat, University of Texas at Austin.

Abstract:

Mixtures of experts (MoEs), a class of statistical machine learning models that combine multiple models, known as experts, to form more complex and accurate models, have been combined into deep learning architectures to improve the ability of these architectures and AI models to capture the heterogeneity of the data and to scale up these architectures without increasing the computational cost. They have become the backbone of important large-scale AI models, including GPT-4 (OpenAI), DeepSeek-V3 (DeepSeek), and Mixtral (Mistral). In mixtures of experts, each expert specializes in a different aspect of the data, which is then combined with a gating function to produce the final output. Therefore, parameter and expert estimates play a crucial role by enabling statisticians and data scientists to articulate and make sense of the diverse patterns present in the data. However, the statistical behaviors of parameters and experts in a mixture of experts have remained unsolved, which is due to the complex interaction between gating function and expert parameters.

Format: The event will be held in a hybrid format, with virtual participation available for participants residing outside Hanoi.

Language: English and Vietnamese

Registration: please click here

Deadline for registration: December 15, 2025

Contact: Mr. Nguyen Quang Huy, nqhuy@viasm.edu.vn