Yuqi Gu, Columbia University
New Statistical Frontiers in Cognitive Diagnostic Models: Deep Generative Structures, General Responses, and Tensor Unfolding
Cognitive diagnostic models (CDMs) are a popular family of discrete latent variable models used in educational measurement to provide fine-grained diagnostic feedback about examinees’ latent skill profiles. In this talk, I present three recent lines of work that substantially expand the statistical foundations and methodological scope of CDMs.
First, I introduce exploratory deep cognitive diagnostic models (DeepCDMs), which adapt deep generative models for diagnostic purposes by employing multiple layers of discrete latent variables to capture hierarchical cognitive structures at varying granularities. A novel layer-wise expectation–maximization algorithm with spectral initialization is proposed for estimation when all Q-matrices across layers are unknown. This modular estimation strategy is grounded in a layer-by-layer identifiability argument that exploits the directed graphical model structure and the discreteness of latent attributes, and it mitigates the initialization sensitivity and bias accumulation that plague classical EM in deep latent variable models.
Second, I present a new paradigm of identifiable general-response CDMs that moves beyond traditional categorical item responses. By establishing identifiability for CDMs with arbitrary response types (including continuous responses such as response times and count-valued responses) under Q-matrix conditions similar to those for classical binary-response CDMs, this work opens the door to principled diagnostic modeling of the rich data types emerging in modern assessments. Efficient EM algorithms are developed for a broad class of exponential family-based CDMs.
Third, I introduce a fundamentally new proof strategy for Q-matrix identifiability based on tensor unfolding. By representing the population distribution of observed responses as a high- order tensor and strategically unfolding it into matrices, the rank properties of these matrices serve as certificates to constructively recover the unknown Q-matrix and the number of latent attributes. This approach departs from prior identifiability analyses, delivers strictly weaker conditions than existing results for general main-effect and all-saturated-effect CDMs, and extends naturally to polytomous responses.
Together, these contributions chart new statistical frontiers for CDMs, bridging deep generative modeling, general response types, and spectral/algebraic methods, while maintaining the identifiability, interpretability, and diagnostic utility that are essential for real-world assessment applications.
About the Speaker
Yuqi Gu is an Assistant Professor of Statistics at Columbia University and also a member of the Columbia Data Science Institute. She received her Ph.D. in Statistics from the University of Michigan in 2020 and completed a one-year postdoc at Duke University before joining Columbia in 2021. Her research focuses on identifiable and interpretable latent variable models, spanning psychometrics, deep generative modeling, and high-dimensional statistics. In psychometrics, she has developed foundational identifiability theory and scalable estimation methods for cognitive diagnostic models (including their deep-generative and general-response modern variants), as well as efficient, provable spectral methods for latent class models and grade of membership models. Her work is supported by the NSF, and she has published extensively in leading journals across statistics, psychometrics, and machine learning, including Journal of the American Statistical Association, Journal of the Royal Statistical Society Series B: Statistical Methodology, Annals of Statistics, Psychometrika, and Journal of Machine Learning Research.
