Computerized Adaptive Testing and Multistage Testing with R
Duanli Yan & Alina A. von Davier
Full day short course
The goal of this workshop is to provide a practical (and brief) overview of the theory on computerized adaptive testing (CAT) and multistage testing (MST), and illustrate the methodologies and applications using R open source language and several data examples. The implementations rely on the R packages catR and mstR that have been already or are being developed and include some of the newest research algorithms developed by the authors.
This workshop will cover several topics: the basics of R, theoretical overview of CAT and MST, CAT and MST designs, assembly methodologies, catR and mstR packages, simulations, and applications.
The intended audience for the workshop is undergraduate/graduate students, faculty, researchers, practitioners at testing institutions, and anyone in psychometrics, measurement, education, psychology, and other fields who is interested in computerized adaptive and multistage testing, especially in practical implementations of simulation using R.
Summary
Computerized adaptive testing (CAT) has become a very popular method of administering questionnaires, collecting data and on-the-fly scoring (van der Linden & Glas, 2010; Wainer, 2015). It has been used in many large-scale assessments over last decades and is currently an important field of research in psychometrics. Multistage testing (MST), on the other hand, got increased popularity in recent years (Yan, von Davier, & Lewis, 2014).
Both approaches rely on the notion of adaptive testing: items are administered sequentially and selected optimally according to the current responses to the administered items. In other words, the selection of the next items to administer depends on some current, ad-interim estimation of ability that is based on previously administered items. The conceptual difference between CAT and MST is that with CAT, items are selected one after each other (among a large pool of available items) and the ability of the test taker is estimated after the administration of each item. In MST, however, items are included in predefined modules and the selection of the subsequent modules is based on the performance on the previously administered modules, not on the single items (Magis, Yan, & von Davier, 2017).
Both methods have advantages and drawbacks with respect to each other and to linear testing. However, their practical usefulness relies mostly on accurate implementations of algorithms to perform test assembly, optimal item or module selection, (IRT) scoring, stopping rules and reporting.
In CAT, several commercial software (CATSim, Adaptest…) exist and some open-source solutions for simulation studies exist, most of them implemented in the R software, among others the packages catR (Magis & Barrada, in press; Magis & Raîche, 2012) and mirtCAT (Chalmers, 2015) and the R-based software Firestar (Choi, 2009). In MST, MSTGen (Han, 2013) exists. Very recently, the R package mstR was developed to provide a tool for simulations in the MST context, similarly to the catR package for CAT framework.
The purpose of this workshop is threefold: a) to provide a brief overview of CAT and MST approaches and outline their specificities, advantages, and drawbacks with respect to linear testing, as well as their technical challenges; b) to present the R packages catR and mstR, their options and performances, in a simulation study-oriented perspective; c) to run several examples of CAT and MST with both packages as illustrations.
The workshop will be a mix of theoretical and practical content. Demonstrations of catR and mstR will be used to illustrate the theoretical framework. Participants are encouraged to bring their laptops with R being pre-installed (and possibly also the R packages catR and mstR, though this can be fixed at the beginning of the workshop). Although R is available under Windows, Linux/ UNIX and MacOS platforms, demos will be run under Windows 7. Hands-out and R scripts will be made available for the participants.
References
- Chalmers, P. (2015). mirtCAT: Computerized adaptive testing with multidimensional item response theory. R package version 0.6.1. http://CRAN.R-project.org/package=mirtCAT
 - Choi, S. W. (2009). Firestar: Computerized adaptive testing simulation program for polytomous item response theory models. Applied Psychological Measurement, 33, 644-645.
 - Han, K. T. (2013). MSTGen: simulated data generator for multistage testing. Applied Psychological Measurement, 37, 666-668.
 - Magis, D., & Barrada, J. R. (in press). Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software.
 - Magis, D., & Raîche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48, 1-31.
 - Magis, D., Yan, D., & von Davier, A.A. (2017). Computerized Adaptive and Multistage Testing with R. New York: Springer.
 - van der Linden, W. J., & Glas, C. A. W. (2010). Elements of Computerized Adaptive Testing. New York: Springer.
 - Wainer, H. (2015). Computerized Adaptive Testing: A Primer (2nd Ed). Routledge.
 - Yan, D., von Davier, A.A., & Lewis, C. (2014). Computerized Multistage Testing: Theory and Applications. London: Chapman and Hall.
 
About the instructors
Duanli Yan
  
Dr. Duanli Yan is
  Director of Data Analysis and Computational Research for
  Automated Scoring group in the Research and Development division
  at Educational Testing Service (ETS). She is also an Adjunct
  Professor at Fordham University. She was a Psychometrician for
  several operational programs and led the EXADEP™ test
  and the TOEIC® Institutional programs, a
  Development Scientist for innovative research applications. She
  was the recipient of 2011 ETS Presidential Award, 2013 NCME
  Brenda Loyd award, and 2015 IACAT Early Career Award, and 2016
  AERA Division D Significant Contribution to Educational
  Measurement and Research Methodology award. She is a co-editor
  for volume Computerized Multistage Testing: Theory and
  Applications and Handbook of Automated Scoring: Theory
  into Practice. She is also a co-author for book Bayesian
  Networks in Educational Assessment and Computerized
  Adaptive and Multistage Testing with R. She has presented
  training sessions and workshops at the National Council of
  Measurement in Education (NCME), International Association for
  Computerized Adaptive Testing (IACAT), and International
  Psychometrics Society (IMPS).
Alina A. von Davier
  
Alina A von
  Davier  is a psychometrician and researcher
  in computational psychometrics, machine learning, and
  education. Von Davier is a researcher, innovator, and an
  executive leader with over 20 years of experience in EdTech and
  in the assessment industry. She is the Chief of Assessment
  at Duolingo, where she leads the Duolingo English
  Test research and development area. She is also the Founder
  and CEO of EdAstra Tech, a service-oriented EdTech company. In
  2022, she joined the University of Oxford as an
  Honorary Research Fellow,and Carnegie Mellon
  University as a Senior Research Fellow.