Computerized Adaptive Testing and Multistage Testing with R

Duanli Yan & Alina A. von Davier

Post July 15, 2024 - 9:30am - 12:30pm

Half day short course (9:30am – 12:30pm)

Short Course #4

This short course provides a practical (and brief) overview of the theory on computerized adaptive testing (CAT) and multistage testing (MST). It illustrates the methodologies and applications using R open-source language and several data examples. The implementations rely on the R packages catR and mstR that have been already or are being developed and include some of the newest research algorithms developed by the authors.

The short course will cover several topics: the basics of R, theoretical overview of CAT and MST, CAT and MST designs, assembly methodologies, catR and mstR packages, simulations, and the most recent real-world applications with AI.

Intended Audience

The intended audience for the workshop is undergraduate/graduate students, faculty, researchers, practitioners at testing institutions, and anyone in psychometrics, measurement, education, psychology, and other fields who is interested in computerized adaptive and multistage testing, especially in practical implementations of simulation using R.

Summary

Computerized adaptive testing (CAT) has become a very popular method of administering questionnaires, collecting data and on-the-fly scoring (van der Linden & Glas, 2010; Wainer, 2015). It has been used in many large-scale assessments over last decades and is currently an important field of research in psychometrics. Multistage testing (MST), on the other hand, got increased popularity in recent years (Yan, von Davier, & Lewis, 2014). The recent technology advances and AI further enriched CAT and MST digital assessments.

CAT and MST are both adaptive testing administered in multiple stages: items are administered sequentially and selected optimally according to the current responses to the administered items. In other words, the selection of the next items to administer depends on some current, ad-interim estimation of ability that is based on previously administered items. The concepts of CAT and MST are merging with CAT that the items are selected one after aother and the ability of the test taker is estimated after the administration of each item, i.e., one item per stage ; with MST that the items are in predefined modules and the selection of the subsequent modules is based on the performance on the previously administered modules, i.e., one module per stage; but both are administered by multiple stages (Magis, Yan, & von Davier, 2017).

There are advantages and drawbacks with respect to CAT and MST and to linear testing. However, their practical usefulness relies mostly on accurate implementations of algorithms to perform test assembly, optimal item or module selection, (IRT) scoring, stopping rules and reporting.

There are many CAT and MST software exist including open-source R software for simulation studies such as catR (Magis & Barrada, in press; Magis & Raîche, 2012), mstR (Magis, Yan, von Davier, 2023), CATSIM and MSTGen (Han, 2010, 2013), catIrt (Nydick, 2022), mirtCAT (Chalmers & Nordmo, 2022), xxIRT (Luo, 2022), Rmst (Luo, 2022), dexterMST (Bechger et al, 2022), and Firestar (Choi, 2009).

The purpose of this short course is threefold: (a) to provide a brief overview of CAT and MST approaches and outline their specificities, advantages, and drawbacks with respect to linear testing, their technical challenges, as well as their real world applications; (b) to present the R packages catR and mstR, their options and performances, in a simulation study-oriented perspective; © to run several examples of CAT and MST with both packages as illustrations.

The short course will be a mix of theoretical and practical content. Demonstrations of catR and mstR will be used to illustrate the theoretical framework. Participants are encouraged to bring their laptops with R being pre-installed (and possibly also the R packages catR and mstR, though this can be fixed at the beginning of the workshop). Hands-out and R scripts will be available for the participants.

References

Chalmers, P. (2015). mirtCAT: Computerized adaptive testing with multidimensional item response theory. R package version 0.6.1. http://CRAN.R-project.org/package=mirtCAT
Choi, S. W. (2009). Firestar: Computerized adaptive testing simulation program for polytomous item response theory models. Applied Psychological Measurement, 33, 644-645.
Han, K. T. (2013). MSTGen: simulated data generator for multistage testing. Applied Psychological Measurement, 37, 666-668.
Magis, D., & Barrada, J. R. (2020). Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software.
Magis, D., & Raîche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48, 1-31.
Magis, D., Yan, D., & von Davier, A.A. (2017). Computerized Adaptive and Multistage Testing with R. New York: Springer.
van der Linden, W. J., & Glas, C. A. W. (2010). Elements of Computerized Adaptive Testing. New York: Springer.
Wainer, H. (2015). Computerized Adaptive Testing: A Primer (2nd Ed). Routledge.
Yan, D., von Davier, A.A., & Lewis, C. (2014). Computerized Multistage Testing: Theory and Applications. London: Chapman and Hall.

About the instructors

Duanli Yan

Duanli Yan is a Director of Data Analysis and Computational Research for Automated Scoring group in the Research & Development division at ETS. She is also an Adjunct Professor at Rutgers University. She holds a Ph.D. in Psychometrics from Fordham University. Dr. Yan was the statistical coordinator for the EXADEP™ test, and the TOEIC^® Institutional programs, a Development Scientist for innovative research applications and a Psychometrician for several operational programs. She is the recipient of 2016 AERA Division D Significant Contribution to Educational Measurement and Research Methodology award, 2022 and 2023 NCME Bradley Hanson award. She is a co-editor for volumes Handbook of Automated Scoring: Theory into Practice and Computerized Multistage Testing: Theory and Applications. She is also a co-author for book Bayesian Networks in Educational Assessment, and a co-author for Computerized Adaptive and Multistage Testing with R. Dr. Yan has been an invited workshop and symposium organizer and presenter at many conferences such as those of the National Council of Measurement in Education (NCME), International Association for Computerized Adaptive Testing (IACAT), and International Psychometrics Society (IMPS).

Alina A. von Davier

Alina A. von Davier is a Chief of Assessment at Duolingo where she is responsible for developing a computational psychometrics framework for assessment and learning. Computational psychometrics, which includes machine learning and data mining techniques, Bayesian inference methods, stochastic processes and psychometric models are the main set of tools employed in her current work. She published several books and numerous papers in peer reviewed journals. Previously, she worked at ACT where she led ACTNext, an R&D-based innovation unit, and before that, she worked at Educational Testing Service (ETS). During her tenure at ETS she led the Computational Psychometrics Center and the operational psychometric work for the international large-scale English assessments, such as TOEFL^R and TOEIC^R. She edited a volume on test equating, Statistical Models for Test Equating, Scaling, and Linking, which won 2013 AERA Division D Significant Contribution to Educational Measurement and Research Methodology award. She is a co-editor for volume Computerized Multistage Testing: Theory and Applications, which won 2016 AERA Division D Significant Contribution to Educational Measurement and Research Methodology award. She is a co-author for the book Computerized Adaptive and Multistage Testing with R.