Hội thảo thường niên lần thứ 4 của Mạng lưới Thống kê ứng dụng Việt Nam-Viện nghiên cứu cao cấp về toán

Hội thảo thường niên lần thứ 4 của Mạng lưới Thống kê ứng dụng Việt Nam

Báo cáo mời

PGS. TS. Đào Thị Thanh Bình, Trường Đại học Hà Nội

Title: An empirical examination of financial performance and distress profiles during Covid-19: the case of fishery and food production firms in Vietnam

Abstract: Purpose - Financial ratios are often employed to categorize firms into different clusters of financial performance. As a result, this study aims to classify firms using financial ratios with advanced techniques and identify the transition matrix of firms moving clusters during the Covid-19 period. Design/methodology/approach - This article uses compositional data (CoDa) analysis based on existing clustering methods with transformed data employing weighted logarithms of financial ratios. The data comprises the 66 listed firms in the two sectors of Vietnam's food and beverage and fishery over three years from 2019 to 2021, including the Covid-19 period. Findings - These firms can be classified into three clusters of distinctive characteristics to set benchmarks for solvency and profitability. The results also show the migration from one cluster to another during the Covid-19 pandemic, which can be used to calculate the transition probability or the transition matrix. Practical implications – The authors’ findings indicate the three distinct clusters (good, average, and below-average firm performance) that can help financial analysts, accountants, investors, and other strategic decision-makers make informed choices. Originality - Clusters of financial ratios are often associated with a number of serious shortcomings, including ratio choices, skewed distributions, outliers, and redundancy. This study is motivated by a weighted CoDa approach that overcomes these issues. This method can be generalized to classify firms in multiple sectors or in other emerging markets.

NCS. Đỗ Trọng Đạt, Đại học Michigan, Mỹ

Title: Dendrogram of mixing measures: Learning latent hierarchy and model selection for finite mixture models

Abstract: We present a new way to summarize and select mixture models via the hierarchical clustering tree (dendrogram) of an overfitted latent mixing measure. Our proposed method bridges agglomerative hierarchical clustering and mixture modelling. The dendrogram’s construction is derived from the theory of convergence of the mixing measures so that we can (1) consistently select the true number of mixing components and (2) recover a fast convergence rate for parameter estimation from the tree, even when the model parameters are only weakly identifiable. In theory, it explicates the choice of the optimal number of clusters in hierarchical clustering. In practice, the dendrogram reveals more information on the hierarchy of subpopulations compared to traditional ways of summarizing mixture models. Several simulation studies are carried out to support our theory. We also illustrate the methodology with an application to single-cell RNA sequence analysis.

TS. Nguyễn Tiến Đạt, Trường ĐHKHTN, ĐHQG TP Hồ Chí Minh

Title: Adaptive warped kernel estimation for nonparametric regression with circular responses

Abstract: In this work, we consider a nonparametric regression problem for circular data, meaning that observations are represented by points lying on the unit circle. We propose a warped kernel estimation procedure with data-driven selection of the bandwidth parameter following a Goldenshluger-Lepski method. The convergence rate of our statistical estimators is examined on anisotropic H¨older classes of functions for pointwise estimation. Furthermore, the optimality of our methodolody will be presented if the presentation’s time permits. Finally, some numerical studies are presented to illustrate the good performances of our approach.
This is a joint work with Thanh Mai Pham Ngoc (LAGA, Universit´e Sorbonne Paris Nord) and Vincent Rivoirard (CEREMADE, Universit´e Paris-Dauphine - PSL)

PGS. Phạm Văn Hải, Đại học Bách khoa Hà Nội
Email: haipv@soict.hust.edu.vn

Title: Knowledge graph with its potential applications

Abstract: TBC
Knowledge graph is a graph database enriched with more detailed, comprehensive data that makes sense to capture the real world. The advantage of a network of knowledge graphs is that a comprehensive view of any considering entities as well as dependent entities. Knowledge to answer questions can be given easily via the information and links of the entities, and explain why and how answers are returned from the structure in the graph.
The research has presented a novel approach using a knowledge graph /fuzzy knowledge graph to apply in variety of domains and AI systems by dealing with large data. In the experimental applications, digital human profiles/ data sets are collected from conventional databases combination with social networks, health systems, and medicals in real-time, and a knowledge graph is created to represent complex-relational user attributes of objects in large datasets. The experiments in the knowledge graph for real world of application domains to illustrate the proposed approaches.
Applied knowledge graph in real-world applications in the state of arts as shown in the figure below.

TS. Tô Đức Khánh, Trường ĐHKHTN, ĐHQG TP Hồ Chí Minh

Title: Empirical likelihood confidence regions for sensitivity and specificity at optimal Youden index-based threshold

Abstract: Sensitivity and specificity are two well-known indices for the accuracy of a diagnostic test. In case of continuous diagnostic test, a specific diagnostic threshold is needed to obtain sensitivity and specificity. Choosing a high threshold level produces a high specificity, and a low sensitivity; choosing a low threshold level gives opposite results. Therefore, selecting an “optimal” diagnostic threshold is required for clinical application. Recently, there exists a variety of approaches, among them, the one based on the Youden index is certainly the most popular. The optimal Youden index-based threshold is the threshold that maximize sum of sensitivity and specificity. When an optimal threshold is estimated from data, both diseased and healthy samples are involved, and hence, the corresponding estimated sensitivity and specificity are correlated, thus joint inference is necessary to take into account such a correlation. Several parametric and non-parametric methods are proposed to construct the joint confidence region for sensitivity and specificity at the optimal Youden index-based threshold. The proposed methods are based on empirical likelihood pivots, giving rise to likelihood-type regions with no predetermined constraints on the shape and automatically range-respecting. Together with theoretical results, illustrative examples, involving real datasets, are also presented.

ThS. Nguyễn Bảo Ngọc, Trường ĐHKHTN, ĐHQG Hà Nội

Title: The prognostic factors of mortality in septic patients: a case study in Vietnam.

Abstract: Sepsis is defined as organ function impairment and is life-threatening to patients due to dysregulated host response to infection. Sepsis is also the dominant cause of death in patients in intensive care units (ICUs). Recently, most studies of sepsis have focused on assessing the impact of Sequential Organ Failure Assessment (SOFA) and other factors on the prognosis of patients in ICUs based on the statistical logistics regression model or Cox regression model. This report has performed a cohort study for the first time collecting septic patients from 108 Military Central Hospital to investigate and provide the prognosis factors of mortality.

This is a joint work with Pham Dinh Tung, Nguyen Trong Hieu, Nguyen Van Tuan, and Truong Nhat My

TS. Nguyễn Trang Thảo, Viện Trí tuệ nhân tạo và tính toán, Trường Đại học Văn Lang

Title: Clustering for probability density functions with fuzzy and automatic approach

Abstract: This presentation introduces the problem of clustering for probability density functions and discusses some recent advances in fuzzy clustering and automatic clustering. Using fuzzy clustering, a probability density function can simultaneously belong to multiple clusters with varying degrees of membership. In the meantime, automatic clustering can determine a suitable number of clusters. Some results on convergence behaviors and applications in data analysis and image recognition are also introduced in the presentation.

TS. Lê Thị Thanh Tịnh, Đại học Đà Nẵng

Title: Exploring Relationships within Categorical Data: A Discussion on Applied Statistics with Examples in Educational Studies

Abstract: In educational research, valuable insights often lie hidden within categorical data, such as participants' demographics, beliefs, teaching methods, and learning outcomes. This presentation underscores the crucial role of employing appropriate statistical methods to unearth meaningful relationships in educational research, particularly when dealing with qualitative data. The discussion will centre around the application of key statistical methodologies, including chi-square tests, the marginal homogeneity test, and multinomial logistic regression in educational studies. Real-world examples extracted from educational research will be utilized to illustrate the practical application of these techniques, providing tangible insights into navigating research inquiries related to categorical variables. The presentation also delves into necessary considerations and arguments involved in applying and interpreting statistical analyses within the context of educational studies. This presentation is designed for educators and researchers interested in leveraging the power of categorical data analysis to gain a deeper understanding of educational phenomena.