Statistical Learning Methods for Process Data

Qiwei He, Jingchen Liu, Xueying Tang, & Susu Zhang


Full day short course (Monday, July 19; 9:00AM-5:00PM US Eastern)

This short course introduces several recent advancements in the analysis of process data collected from computer-based interactive items. Methods that extract information from both observed sequential actions (e.g., n-grams, sequence similarity computation) and latent variables will be presented. Covered topics include (1) automatic feature extraction from action sequences and their timestamps via n-grams, multidimensional scaling, and sequence-to-sequence autoencoders; (2) sequence segmentation and subtask analysis with neural language modelling; (3) introduction to ProcData, an R package for process data analysis; and (4) applications of process features to practical testing and learning problems, including scoring, differential item functioning correction, computerized adaptive testing, and adaptive learning. During the full-day short course, participants will be provided with an overview of process data collected from computer-based large-scale assessments, learn about various approaches to analyzing and using log data, and obtain hands-on experience working with log data through examples and exercises. Intended audience are researchers and practitioners interested in data-driven methods for analyzing process data from assessments and learning environments. To fully engage in the hands-on activities, familiarity with R and RStudio is expected. Running the ProcData package requires installation of R, Rcpp, and Python. Installation instructions and support will be provided. Participants are expected to have access to their own laptop with Windows or Mac operating system. By the end of the workshop, participants are expected to get a composite picture of process data analysis and know how to conduct various analyses using the ProcData package.

Qiwei He

Jingchen Liu

Xueying Tang

Susu Zhang

Log in