IMPS 2019 Spotlight talks

Spotlight talks are a new category of talks at IMPS. Based on the evaluations of a panel of reviewers, the Program Committee selected the talks below to be put "in the spotlight" because they appeal to a broader audience.

Marjolein Fokkema

Marjolein Fokkema, Leiden University, The Netherlands

Marjolein Fokkema obtained her PhD at the Vrije Universiteit Amsterdam and currently works as an assistant professor at Leiden University. Her main research interest is in decision-tree methods, because these provide intuitive tools for evidence-based decision making. She has developed decision-tree methods for adaptive testing and analyzing multilevel data, and (co-)authored several R packages. Her current research focusses on balancing accuracy and interpretability in statistical prediction, and decision-tree methods for growth curve models.

Prediction rule ensembles: Balancing interpretability and accuracy in statistical prediction

Prediction Rule Ensembles (PREs) are a relatively new statistical learning method. PREs aim to strike a balance between the high predictive accuracy of decision tree ensembles (e.g., random forests, boosted tree ensembles), and the ease of interpretability of sparse regression methods and single decision trees. While PREs have been shown to provide predictive accuracy competitive with random forests, PREs generally consist of a small number of rules and thus may be easier to interpret for human decision makers. As such, PREs may contribute to bridging the gap between clinical research and practice in psychology. This presentation introduces PRE methodology, starting from the algorithm originally proposed by Friedman and Popescu (2008). The potential of PRE methodology will be illustrated through several applications to psychological research data, for example on the prediction of academic achievement and chronic depression. The applications illustrate the functionality of the R package ‘pre’, which implements the algorithm originally proposed by Friedman and Popescu. The package ‘pre’ provides several potential advantages over the original implementation, like the use of an unbiased rule induction algorithm, and support for a wider range of response variables (i.e., continuous, categorical, count, multivariate and survival). Several topics that may be particularly relevant for psychological research will be discussed, like including rules that are known a-priori to be relevant for predicting the response, the application of (non-) negativity constraints, and the analysis of multilevel data.


Thorsten Meiser

Thorsten Meiser, University of Mannheim, Germany

Thorsten Meiser is Professor of Psychology and Chair of Research Methods and Psychological Assessment at the University of Mannheim. His research interests include psychometrics, statistical models in cognitive psychology and decision making. His recent work focuses on IRT models accommodating response styles, attention in prospective memory, and pseudo-contingencies in choice behavior. He holds a doctoral degree from the University of Heidelberg and has been Research Fellow at Cardiff University, Associate Researcher at Jena University and Associate Professor at Marburg University. Currently he is Head of a Research Training Group on “Statistical Modeling in Psychology” funded by the German Research Foundation.

IRTree Mixture Models for Decomposing Trait-Based Responses and Response Styles

This talk gives an overview of IRT models for decomposing trait-based response processes and response styles with a particular focus on recent advances in IRTree modeling. Applications of IRTree models are mostly based on the inherent assumptions (a) that decision nodes reflect distinct response processes, so that the nodes are parameterized in terms of unidimensional IRT models with separate random effects, and (b) that the model parameters are homogeneous across respondents, so that only one vector of discrimination and threshold parameters is estimated. These restrictive assumptions have recently been released in IRTree models with multidimensional node parameterizations and finite mixture components, respectively. IRTree models with multidimensional node parameterizations allow researchers to specify and test the hypothesis that decisions are affected by multiple judgment processes which may have similar or opposite effects on the observed responses depending on the tree structure. Mixture-distribution IRTree models allow for parameter heterogeneity across subpopulations and can disentangle subpopulations that are susceptible to response style effects to varying degrees. In a new study, we combine multidimensional node parameterizations and finite-mixture components to separate subpopulations with different response behavior. Using mixture IRTree models with uni- and multidimensional nodes, we disentangle subpopulations that engage in trait-based response processes for fine-grained ordinal ratings from subpopulations that employ a simplified response process in which the trait only affects directional (dis-)agreement judgments. The new approach is illustrated with rating data from PISA 2015 and validated with extraneous covariates. (This is joint work with Lale Khorramdel from NBME.)


Adrian Quintero

Adrian Quintero, ICFES - Colombian Institute for Educational Evaluation, Colombia

Adrian Quintero works as a researcher in the Department of Statistics at ICFES, the Colombian Institute for Educational Evaluation. He obtained his PhD in Biomedical Sciences at KU Leuven, where he developed extensions of Bayesian hierarchical models with applications in medical research. His research interests include model selection techniques, factor analysis, multilevel models and Bayesian methods in general. Currently, he focusses on computer adaptive testing (CAT), assessing dimensionality in factor analysis and verifying assumptions in standardized tests using Three Parameter Logistic (3PL) models.

Selecting the number of factors in Bayesian factor analysis

When implementing factor analysis, the selection of the number of factors is challenging in both frequentist and Bayesian approaches. The validity of the likelihood ratio test (LRT) in the frequentist setting strongly depends on the assumption that the factor loadings matrix is of full rank. However, such is not the case when fitting models with more latent components than the true (unknown) number of underlying factors. This invalidates the regularity conditions necessary for the LRT, and the method retains too many factors in practice. Information criteria such as AIC and BIC may also be affected by the regularity conditions. On the other hand, conventional Bayesian methods present two serious drawbacks. Firstly, implementation of the procedures is highly computationally demanding, and secondly, the ordering of the outcomes influences the results since a lower triangular structure is generally assumed for the factor loadings matrix. Therefore, we propose a Bayesian method without imposing the lower triangular structure to overcome ordering dependence. Our approach considers a relatively large number of factors and includes auxiliary multiplicative parameters which may render null the unnecessary columns in the factor loadings matrix. The underlying dimensionality is then inferred based on the number of non-null columns in the factor loadings matrix. We show that implementation of our approach is simple via an efficient Gibbs algorithm. The advantages of the method in selecting the correct dimensionality are illustrated via simulations and using standardized tests from ICFES, the Colombian Institute for Educational Evaluation.

Annie Kang

Hyeon-Ah (Annie) Kang, Department of Educational Psychology, The University of Texas at Austin, TX, US

Kang is an assistant professor in the Quantitative Methods program in the Department of Educational Psychology. She also serves as an associate director of the Center for Applied Psychometric Research at the College of Education. Her research is centered on theoretical and applied statistics in educational and psychological measurement.

Detecting Item Parameter Drift Online Using Response and Response Times

When tests are administered continuously or at frequent time intervals, some items may become known to prospective examinees or may undergo changes in the statistical properties. The purpose of this study is to present a sequential monitoring procedure that regularly checks on the quality of items across the span of time the items are in operation. The procedure is based on a sequential generalized likelihood ratio test, which evaluates the likelihood of the currently estimated item parameters against the likelihood of the pre-calibrated item parameter values. The test is designed to integrate information from the response and response time data, and detect a change-point as soon as an item exhibits parameter drift within the hierarchical framework (van der Linden, 2007). For estimating the item parameters, we perform continuous online calibration based on moving samples. The suggested procedure provides a powerful automated tool for maintaining the quality of an item pool by conducting a series of hypothesis testing on the individual items under the parametric model that capitalizes on two sources of information. The effectiveness of the proposed method is evaluated through extensive simulation studies and an application to a large-scale high-stakes computerized adaptive test. All evaluations are made in comparison with the existing statistical quality control procedure (e.g., Veerkamp & Glas, 2000).

Leah Feuerstahler

Leah Feuerstahler, Department of Psychology, Fordham University, NY, US

Feuerstahler is an assistant professor of psychometrics and quantitative psychology at Fordham University in New York City. Previously, she received her Ph.D. in quantitative psychology from the University of Minnesota and completed a postdoc at the University of California Berkeley’s graduate school of education. Her research interests include applied psychometrics in education and psychology, and she currently focuses on issues surrounding the specification, fit, and interpretation of item response models.

Characterizing uncertainty in item response model metrics

In item response theory, item parameter standard errors are used to characterize the uncertainty associated with individual parameter estimates. These standard errors also can be used to construct confidence bands (Thissen & Wainer, 1990) around estimated item response functions. Whereas early approaches to constructing confidence bands were based on Fisher information, Yang, Hansen, and Cai (2012) recently suggested a multiple imputation (MI) approach that can be used with any approximation to the item parameter covariance matrix. In both the analytic and MI approaches, confidence bands are constructed by treating the latent variable theta as fixed and plotting the variability of response probabilities conditional on theta. However, theta can also be understood as an artifact of the fitted model such that the theta metric itself is determined with error. Specifically, the latent trait metric can be defined as a multidimensional random vector of conditional response probabilities (Ramsay, 1996). Because these multidimensional random vectors will lead to somewhat different predictions across calibrations, item parameter estimation error implies uncertainty about the location of the metric. In this talk, I describe how MI and fully Bayesian approaches can be used to visualize and quantify metric stability, that is, the variability of the theta metric implied by item parameter standard errors. I also clarify how metric stability is related to other item response model evaluation outcomes (e.g., test information, model fit). Overall, I argue that metric stability measures provide unique information that aids in a holistic approach to model evaluation.