Date & Time: Thursday, July 22 at 9:30AM EST
The objective of item difficulty modeling (IDM) is to predict statistical parameters of an item (e.g., difficulty) based on features extracted directly from the item (e.g., number of words). IDM may resolve multiple practical issues: item preknowledge caused by pretesting new items; low item pool usability caused by developing new items without controlling their statistical parameters (which may unbalance the pool and limit the assembly of more test forms); etc. We propose predicting discrete ICCs based on softmax classification. This approach exploits one-to-one mapping from monotonically non-decreasing ICC to probability mass function (PMF). A neural network (NN) was trained using soft labels for each item (by mapping ICCs to PMFs) with prediction softmax layer representing PMF defined on a set of ability levels and the Kullback-Leibler divergence as a loss function. This NN had the following four layers: input layer with features; dense layer of 16 nodes with ReLU activation functions; dropout regularization layer with 0.1 rate; and prediction softmax layer. The 1973 items from a high-stakes testing program were used, where 21 features were extracted directly from each item (most of the features were NLP based, e.g., semantic similarity between passage and key). Preliminary results indicated the following: using only features correlated with item difficulty improved predictions; adding two additional features (prediction of item difficulty (high or low) by test developers and item pretest position) improved predictions; predicted ICCs were closer to true ICCs than ICCs computed from predicted item parameters. Other advantages of predicting ICC will be demonstrated.
About the Speaker
Dmitry Belov has a Ph.D. in computer science from the Institute of Engineering Cybernetics of the Academy of Sciences of Belarus. He is a senior computer scientist in the Assessment Sciences Department at the Law School Admission Council. His current research interests include quantitative methods for detecting cheating on tests and item difficulty modeling.