Introduction to AI-based Automated Item Generation and Automated Scoring

Alina A. von Davier and Duanli Yan

Post

Full day short course (9:00am – 5:00pm)

Short course #1

In the era of artificial intelligence (AI), the field of educational testing faces significant challenges, particularly in test development and scoring, two key innovations include automated item generation (AIG) and automated scoring (AS).  Only recently that generative AI has facilitated the development of complex test items on a large scale.

We introduce “the item factory”, for managing large-scale test development including automation of item generation, quality review, quality assurance, and crowdsourcing techniques in adaptive testing. We present an overview of the latest natural language processing (NLP) techniques and large language models for AIG, alongside psychometric principles and practices for test development. We discuss the application of engineering principles in designing efficient item production processes (Luecht, 2008; Dede et al, 2018; von Davier, 2017).  As AS becomes becomes an integral part of the assessment landscape due to their advantages in reporting time, cost, objectivity, consistency, transparency, and feedback. We aim to demystify AS and provide a comprehensive understanding of its workings. We offer an overview of the design, development, evaluation, and quality control of automated scoring systems, along with practical advice and considerations for practitioners on the applications of these systems into formative and summative assessments (Yan, Rupp, & Foltz, 2020).

Intended audience 

This course is designed for individuals interested in the field of large-scale assessments and in AIG and AS. It covers various aspects, including AI for AIG and AS, the development and implementation of AIG and AS systems, standardization and validation techniques, and best practices for quality control and operations.

Summary of Objectives 

This training offers a comprehensive overview to the many facets of automated item generation (AIG) and automated scoring (AS) in adaptive testing, drawing from real-world operational practices, published papers (Attali et al, 2022; von Davier et al., 2024) and the edited volume Handbook of Automated Scoring: Theory into Practice (Yan, Rupp, & Foltz, 2020). Participants will be guided through all operational aspects of AIG and AS, from design to implementation. 
 

  1. Demystifying the Black Box: The participants will gain an in-depth understanding of the various methods used for generating items and constructing automated scoring systems for evaluations.  We will discuss a human-in-the-loop approach to automation and the value of preserving human values in highly automated systems.
  2. Implementing Systems in Operational Practice: Designing and implementing AIG and AS is crucial for educational assessments nowadays. However, transitioning them into operational systems and deploying them requires a more complex process, involving different implementation models and associated procedures. Participants will learn preprocessing textual data, filtering unsoarable essays and diverting them to hand-scoring, model building, score assignment, and reporting.
  3. Evaluating and Maintaining Systems Over Time: The participants will learn various approaches to assessing the performance of AIG and AS systems, including comparisons to human test development and scoring using evaluation metrics, as well as managing system changes in operational practices. See Analytics for Quality Assurance in Assessment (von Davier, Liao, et al., 2022) for an example of such a system. 
  4. Exploring Open Issues, Future Directions, and Engaging in General Discussion: The participants will learn applications of AIG and AS, discuss potential challenges in implementation and the requirements for advancing the field with AI, NLP, psychometrics, ethics, and human values in strengthening the operational use of AIG and AS.

About the Instructors

Alina A. von Davier (Duolingo)

Alina von Davier Dr. Alina von Davier is the chief of assessment, Duolingo, where she leads the Duolingo English Test research and development area. She is also the Founder and CEO of EdAstra Tech, a service-oriented EdTech company. She is a researcher, innovator, and executive leader in the field of computational psychometrics, machine learning, and education.

Duanli Yan (Measurement Incorporated)

Duanli Yan Dr. Duanli Yan is a research scientist at Measurement Incorporated working on AI-based automated scoring. She served as the director of data analysis and computational research in the research and development division in ETS, responsible for automated scoring engine upgrade evaluations. She has extensive experience in adaptive testing, psychometric research, test security, innovative technology research and implementations. Dr. Yan is also an adjunct professor at Fordham University and at Rutgers University, the State University of New Jersey.

The instructors have published extensively including many books and peer reviewed journals. They are co-editors for several volumes:

  • Artificial Intelligence Applications in Educational Learning and Assessment (2026)
  • Computerized Multistage Testing: Theory and Applications (2014), which won 2016 AERA Division D Significant Contribution to Educational Measurement and Research Methodology award,
  • Research for Practical Issues and Solutions in Computerized Multistage Testing (2024)

And they co-authored Computerized Adaptive and Multistage Testing with R (2017). Dr. von Davier is a co-editor for Computational Psychometrics: New methodologies for a new generation of digital learning and assessment. Dr. Duanli Yan is a co-editor for Handbook of Automated Scoring: Theory into Practices, and for Handbook of Research on Science Learning Progressions.

 

Log in