Marie Wiberg, Dept. of Statistics, Umeå School of Business and Economics, Umeå University, Sweden
Marie Wiberg is professor of Statistics with specialty in psychometrics at Umeå University, Sweden. In the past, she has been a visiting scholar in Canada, Chile, USA and the Netherlands. Her research interests include psychometrics in general, test equating, and functional data in psychometrics. She is currently one of the co-editors of the annual Proceedings of the International Meeting of the Psychometric Society (IMPS). From 2017 she will be the first editor and thus responsible for the annual Proceedings of the IMPS. She has published a large number of papers, and served as reviewer for a number of journals (including e.g. Psychometrika, Applied Psychological Measurement, and Journal of Educational Measurement). She has been an associate editor for Journal of Educational Measurement and she is currently on the editorial board of the International Journal of Testing. She has been on the board of the Swedish Statistical Society during many years and she is currently a member of the young academy of Sweden (until 2018). In Sweden, Wiberg has received both a prestigious career award and the Royal Skyttean prize for young researchers.
Optimal Scoring as an Alternative to IRT and Sum Scoring
Test constructors often use item response theory (IRT) in the design and evaluation of items and tests. When delivering test results from an academic test or an industrial constructive test, it is however still common for a test taker to receive a sum score, i.e. a sum of number of correct answers of the items in the test. Optimal scoring is an alternative to IRT and sum scoring. Optimal scoring is built on the ideas of functional data analysis, and in particular, uses parameter cascading in the estimation step. By applying maximum likelihood estimation, a weighted score is obtained– where the weights are specific to the test takers performance level. Simulations show that optimal scoring perform better than sum scores in terms of lower root mean squared error, especially for test takers at performance extremes. A real data example illustrates how optimal scoring can be used in practice. The advantages of using optimal scoring in academic tests and industrial constructive tests as an alternative to IRT and sum scoring are discussed.
Ying (Alison) Cheng, Department of Psychology, Notre Dame, USA
Ying (“Alison”) Cheng is Associate Professor of Psychology and Fellow of the Institute for Educational Initiatives at University of Notre Dame. She received her M.S. in Statistics and Ph.D. in Quantitative Psychology from the University of Illinois at Urbana-Champaign. Her research focuses on methodological issues in psychological and educational measurement, in particular theoretical development and applications of the item response theory. She has published over 40 articles on topics including computerized adaptive testing (CAT), differential item functioning (DIF), classification accuracy and consistency with licensure/certification exams, and formative assessment using cognitive diagnostic modeling and so on. Her work has appeared in premier journals, such as Psychometrika, Applied Psychological Measurement, and British Journal of Mathematical and Statistical Psychology (BJMSP). She currently serves as the Associate Editor of BJMSP, and the American Educational Research Journal. In 2009 she received the Bradley Hanson Award for Contributions to Educational Measurement, and in 2012 the Jason Millman Promising Measurement Scholar Award from the National Council of Measurement in Education. In 2014 she received the faculty CAREER award from the National Science Foundation.
Statistical Quality Control in Psychometrics
Statistical process or quality control (SQC) in its early days means the application of statistical methods to monitor and control an individual industrial process. In modern psychometrics research, there has been abundant application of SQC, particularly in the area of educational measurement. For example, researchers have developed various CUSUM (cumulative sums) control chart procedures to detect outliers and/or person misfit in adaptive or non-adaptive testing since 1998. Very recently, a handful of studies have investigated the utility of change point analysis in psychometrics in detecting speeded responses, person misfit, or unusual fluctuations in the mean score of an assessment. In this talk I will first review briefly the studies using CUSUM procedures, then discuss how change point analysis procedures differ from the CUSUM procedures, and focus on using change point analysis to detect aberrant responses with item responses, or response time data, or both.
Norman Verhelst, Eurometrics, The Netherlands
Norman Verhelst studied experimental and mathematical psychology at the Catholic University of Leuven (Belgium) from 1964 to 1968 and wrote a Ph.D. thesis on mathematical learning theories (1975). He worked for some years at the University of Nijmegen and the University of Utrecht as an assistant professor teaching statistics and methodology and at the same time studying psychometrics, in particular IRT. From 1985 until his retirement in 2010, he worked as a senior researcher at the Dutch National Institute of Educational Measurment (CITO), where the program package OPLM was developed; this program allows for conditional maximum likelihood estimation in a IRT model that at the same time allows for different discriminations of the items. He retired in 2010 and started Eurometrics, a small business in psychometric and statistical consultation. His favorite topics of research are DIF and profile analysis.
Profile Analysis: a generalization of DIF analysis
To investigate if some groups of testees are (dis)advantaged by a certain category of items, sometimes Differential Item Functioning (DIF) analysis is applied to all items of this category. It was hypothesized, for example, that younger testees might be disadvantaged by items belonging to the occupational domain in the Michigan English Test (MET). DIF analysis, however, showed unclear and non convincing results. It will be argued that DIF analysis in this context is not a very good method: it lacks clarity in the interpretation and statistical power in the applications. A new method, called profile analysis, will be presented. In this method the focus is not on single items but on categories of items and the focus is shifted from exploratory to confirmatory analysis. For each category the sequence of observed scores (the observed profile) from a testee are compared to their expected value under the measurement model used (the expected profile). The difference between observed and expected profile is called the deviation profile. These deviation profiles are aggregated in each of two or more groups, and give rise to some statistical tests which show if different groups react differently to categories of items. Profile analysis turns out to be statistically powerful and very flexible in its use: it is not restricted to two categories and the number of groups to be compared is unlimited. It is also shown that DIF analysis is a special case of profile analysis. It is argued that profile analysis is a more constructive approach to deviations from the measurement model used than the usual approach by DIF analysis which is aimed mainly at detecting ‘bad’ items. An example using the data from TIMSS 2011 will sustain our point of view.
Han van der Maas, Department of Psychology, University of Amsterdam, The Netherlands
Han van der Maas received his Ph.D in 1993 for research on methods for the analysis of phase transitions in cognitive development. In 2005 he became professor and chair of Psychological Methods department at the University of Amsterdam. He is the founder and CTO of Oefenweb.nl, a spin-off company that develops innovative adaptive practice and monitoring systems for education, which are used by 2000 primary schools in the Netherlands. The general theme of his research is the formalization of psychological theories. His current research includes network models as alternatives for latent variables models, psychometric models for accuracy and RT, and adaptive learning systems for education.
Psychometrics for complex systems
Humans, like eco-systems, the weather, and the stock market, are non-linear complex systems. With this in mind, it is possible to develop suitable formal measurement models of human psychological functioning. Complex systems can be understood as networks that, depending on connection strength, behave linearly or nonlinearly, or even discretely. There are many interesting technical links and equivalences between network models and the dominant latent variable approach in psychometrics, but conceptually they are very different. This will be discussed in the context of the measurement of cognitive functions and modeling of general intelligence. First, I will present a network model that combines mutualism, central processes and sampling within one integrated (non-g) model of general intelligence. Second, I will present a new approach to educational measurement, motivated by the requirement of high frequent measurements in complex systems research. As an example I will present the results of a web-based computerized adaptive training and monitor systems used by thousands of schools in the Netherlands, yielding over 1 billion item responses.
State of the Art Speakers
Dorret Boomsma, VU Amsterdam, The Netherlands (Psychometrics and genetics)
Dorret Boomsma trained in psychology and behavior genetics. Her research focus is on the analysis of individual difference as a function of genetic and environmental causes. She established The Netherlands Twin Register (NTR), which over the past 30 years recruited over 75,000 twins and over 100,000 of their family members in longitudinal projects, forming the basis for genetic studies of complex traits. Twins and their families have completed surveys, undergone periodic testing, and participated in large biobank projects, that collected DNA, blood, and urine samples. Professor Boomsma’s research has primarily focused on a better understanding of the influence of the genome on physical and mental traits through genetic structural equation modeling, characterizing the genes that are involved through genetic association studies and exploring the value of combined twin and ‘omics’ studies. Her work has led to over 900 published papers and several awards, including the Dutch Spinoza prize and the Dobzhansky Award for lifetime achievement in Behavior Genetics. She is a member of the Netherlands Royal Academy of Sciences.
Psychometrics and Genetics
Quantitative genetic methodology, in particular genetic structural equation modeling, can assist in assessing and understanding the dimensionality of psychometric instruments as often used in psychology and psychiatry. The covariance structures that are observed among sets of items in such instruments are a function of the underlying genetic and environmental covariance structures that may be estimated from studies that include genetically informative designs. The relationship between the observed covariance structure and the underlying genetic and environmental covariance structures may be such that it hampers obtaining a clear estimate of dimensionality using standard tools for dimensionality assessment alone. One situation in which dimensionality assessment may be impeded is when genetic and environmental influences differ from each other in structure and dimensionality. In such situations settling dimensionality issues may be problematic and employing quantitative genetic modeling to uncover the (possibly different) dimensionalities of the underlying genetic and environmental structures may resolve some problems. The approach is illustrated in empirical data on childhood problems and personality in adults, where the use of twin data ensures the identification of the genetic and environmental covariance structures. A second illustration involves estimating the genetic (co)variance between personality items based on genotyped markers in samples of unrelated subjects.
Victoria Savalei, University of British Columbia, Canada (SEM, robust SE's and missing data)
Victoria Savalei received her doctoral degree in Quantitative Psychology from the University of California Los Angeles in 2007, under the supervision of Dr. Peter Bentler. She is presently Associate Professor in the Department of Psychology at the University of British Columbia, where she has been a faculty member since 2007. She has published over 30 peer reviewed articles, over half of these first author. She is Associate Editor of Structural Equation Modeling: An Interdisciplinary Journal, and the most recent winner of the Cattell Award for Outstanding Early Career Contributions to Multivariate Experimental Psychology from the Society of Multivariate Experimental Psychology (SMEP). Her research interests lie primarily in the field of structural equation modeling (SEM), in particular model evaluation and testing with difficult kinds of data, such as incomplete data, nonnormal data, and categorical data. She is also interested in the performance of approximate indices of fit in SEM and in using SEM to model response biases in personality data.
SEM, Robust Corrections, and Missing Data
In this talk I will review the logic of robust standard errors as used in SEM by first showing how these standard errors are derived in the case of linear regression under the violation of assumptions, and then viewing SEM as a special type of a nonlinear regression model. I will then review the most common applications of robust standard errors in SEM. While these are often known as “corrections for nonnormality”, I will illustrate that they should be more accurately described as “corrections for inefficiency”, and have many applications that have nothing to do with the specification of the distribution of the data. Lastly, I will zero in on what is arguably the most popular application of these corrections, and review the different options for robust standard errors, test statistics, and fit indices available to accompany the normal-theory ML estimator for nonnormal and/or incomplete data.
Tom Snijders, Groningen & Oxford, The Netherlands & UK (Social networks)
Tom Snijders is professor of Statistics and Methodology in the Social Sciences at the University of Groningen and emeritus fellow of Nuffield College, University of Oxford. From 2006 to 2014 he was professor of Statistics in the Social Sciences at the University of Oxford. Together with Patrick Doreian he was co-editor of Social Networks from 2006 to 2011. In 2005 he received an honorary doctorate in the social sciences at the University of Stockholm, in 2010 was the recipient of the Georg Simmel Award of INSNA, the International Network for Social Network Analysis, and in 2011 he received an honorary doctorate at the Université Paris-Dauphine. He has written about a variety of topics in statistical inference, including multilevel analysis, social network analysis, and item response theory. His work on developing statistical methodology for network dynamics is implemented in the software SIENA (Simulation Inference for Empirical Network Analysis), which is available as the package RSiena in the statistical system R. His book written jointly with Roel Bosker, Multilevel Analysis; An Introduction to Basic and Advanced Multilevel Modeling, is a well-known and widely used textbook on multilevel analysis.
Network Analysis: Goodbye to Independence Assumptions
Network modeling has developed in the new millennium from a niche topic to the scientific mainstream, but in a large diversity of approaches. This presentation will tell something about the current state of statistical modelling of data sets representing social networks, as distinct from other network approaches such as probabilistic modelling, algorithmic developments, and the use of networks to represent dependencies in ‘regular’ multivariate data. The basic mathematical structure for representing network data is a graph or digraph (directed graph): a set of nodes some of which are tied by edges or directed edges. For a social network, the nodes usually represent social actors and the ties some kind of social relation. Data sets often also contain nodal variables, representing behaviour and other characteristics of the actors. Some conceptual issues are fundamental in social network analysis, and these imply a basic contrast between network data and traditional ‘rectangular’ multivariate data. The first is the dependence between ties, mirroring social phenomena such as reciprocity, transitivity (‘friends of my friends are my friends’), and differential popularity of actors. The second issue is that ties between actors will imply social influence or others kinds of dependence between the tied actors. The third issue is the importance of indirect connections: what matters for my well-being and the information available to me are not only those to whom I am connected, but also their further connections. All this means that for statistical modelling we are in a mess, because we cannot base models on independence assumptions any more. There is some room for permutation-based procedures, but their use is limited. Instead, parametric statistical models have been proposed that are based on conditional independence assumptions. Examples are Stochastic Blockmodels and Latent Space Models, assuming conditional independence given an assumed latent structure; and Exponential Random Graph Models (‘ERGMs’), making certain conditional independence assumptions directly about the ties. For social science research, especially important are questions about the importance of networks for changing actor variables (i.e., characteristics of the nodes) such as individual performance, attitudes, and behavioural tendencies. Given the mutual dependence between networks and actor variables, it is clear that such questions are best studied using longitudinal data. Here a crucial conditional independence assumption is the regular Markov assumption for stochastic processes, but this has to be supplemented by further parametric assumptions to obtain workable models. The Stochastic Actor-oriented Model is used a lot for modelling dynamics of networks as well as modelling the interdependent dynamics of networks and actor variables. The presentation will explain some things about this bestiary of models, although only the surface can be scratched. In addition, some attention shall be paid to the question how much it matters whether or not the usual independence assumptions are made for analysing actor variables; and to current developments, including multilevel network analysis.
John Lockwood, Principal Research Scientist, Educational Testing Service, USA (Value added modeling)
J.R. Lockwood is a Principal Research Scientist at Educational Testing Service. He specializes in longitudinal modeling of student achievement, the measurement of teaching quality, and experimental and quasi-experimental methods in educational evaluation. His methodological areas of expertise include measurement error modeling, causal inference with observational data, and Bayesian statistics. He has led U.S. Department of Education Institute of Education Sciences projects on enhanced value-added models for estimating teacher effects, and currently leads a project developing novel methods for addressing measurement error in standardized test scores used in secondary data analysis. Prior to joining ETS, he was a Senior Statistician at the RAND Corporation. He received his Ph.D. in Statistics from Carnegie Mellon University in 2001, and won the Leonard J. Savage award for outstanding doctoral dissertation in Bayesian Application Methodology.
The use of standardized test scores to measure the performance of individual teachers has been one of the most controversial topics in U.S. educational research and policy over the past two decades. There have been substantial disagreements among policy makers, educators, researchers and other stakeholders about whether such teacher “value-added (VA)” measures can provide valid, fair and reliable inferences about the effects of individual teachers on student achievement progress. The essential challenge to estimating teacher VA is that groups of students taught by different teachers may have vastly different background characteristics, including different levels of prior achievement. VA modeling generally refers to the use of statistical models to adjust for these differences using longitudinal data on individual students that is now routinely archived by schools, districts and U.S. states. The pros and cons of different model specifications have been the subject of intense debate by econometricians, statisticians and psychometricians, with some of the essential issues lying at the nexus of these fields. This presentation will summarize the basic evaluation problem of estimating teacher VA, and will discuss recent empirical evidence for and against the validity of inferences about teachers made from VA models. It will discuss key areas of future research, including ways in which the psychometric community may be able to contribute to enhanced model specifications.