Statistical Learning Methods for Process Data

Qiwei He, Jingchen Liu, Xueying Tang, & Susu Zhang

Post February 26, 2021

Full day short course (Monday, July 19; 10:00AM-5:00PM US Eastern)

This short course introduces several recent advancements in the analysis of process data collected from computer-based interactive items. Methods that extract information from both observed sequential actions (e.g., n-grams, sequence similarity computation) and latent variables will be presented. Covered topics include (1) automatic feature extraction from action sequences and their timestamps via n-grams, multidimensional scaling, and sequence-to-sequence autoencoders; (2) sequence segmentation and subtask analysis with neural language modelling; (3) introduction to ProcData, an R package for process data analysis; and (4) applications of process features to practical testing and learning problems, including scoring, differential item functioning correction, computerized adaptive testing, and adaptive learning. During the full-day short course, participants will be provided with an overview of process data collected from computer-based large-scale assessments, learn about various approaches to analyzing and using log data, and obtain hands-on experience working with log data through examples and exercises. Intended audience are researchers and practitioners interested in data-driven methods for analyzing process data from assessments and learning environments. To fully engage in the hands-on activities, familiarity with R and RStudio is expected. Running the ProcData package requires installation of R, Rcpp, and Python. Installation instructions and support will be provided. Participants are expected to have access to their own laptop with Windows or Mac operating system. By the end of the workshop, participants are expected to get a composite picture of process data analysis and know how to conduct various analyses using the ProcData package.

About the Instructors

Qiwei He

Qiwei (Britt) He is a Research Scientist in the Center for Next Generation Psychometrics and Data Science at Educational Testing Service (ETS). She has research interests in educational and psychological measurement, data/text mining, with specific attention to methodology advancement in large scale assessments (e.g., PISA, PIAAC) and complex new data source in computer-based tests (e.g., process data, textual data). She is the recipient of 2019 Jason Millman Promising Measurement Scholar Award given by the National Council on Measurement in Education (NCME), 2017 Alicia Cascallar NCME Award for an Outstanding Paper by an Early Career Scholar and the OECD Thomas J. Alexander Fellowship. She is leading an NCES commissioned project in leveraging process data in analyzing adults’ problem-solving skills and co-leading an NSF-funded project to develop latent and graphical models for complex dependent data in education.

Jingchen Liu

Jingchen Liu is an Associate Professor in the Department of Statistics at Columbia University. He holds a Ph.D. in Statistics from Harvard University. He is the recipient of 2018 Early Career Award given 2 by the Psychometric Society, 2013 Tweedie New Researcher Award given by the Institute of Mathematical Statistics, and a recipient of the 2009 Best Publication in Applied Probability Award given by the INFORMS Applied Probability Society. He has research interests in statistics, psychometrics, applied probability, and Monte Carlo methods. He is currently an associate editor of Psychometrika, British Journal of Mathematical and Statistical Psychology, Journal of Applied Probability/Advances in Applied Probability, Extremes, Operations Research Letters, and STAT.

Xueying Tang

Xueying Tang is an Assistant Professor in Statistics in the Department of Mathematics at the University of Arizona. Prior to joining the University of Arizona, she was a postdoctoral research scientist at Columbia University in the Department of Statistics. Her research interests include high dimensional Bayesian statistics, latent variable models and their applications in education and psychology. She has worked extensively on data-driven methods for the analysis of process data from educational assessments and is one of the developers of the ProcData R package for exploratory analysis of log data.

Susu Zhang

Susu Zhang is an Assistant Professor of Psychology and Statistics at the University of Illinois at Urbana-Champaign (UIUC). She was previously a postdoctoral research scientist at Columbia University in the Department of Statistics. Her research interests include latent variable modeling, the analysis of complex data (e.g., log data) in computer-based educational and psychological assessments, and longitudinal models for learning and interventions.