Analyzing NAEP and TIMSS Data with Direct Estimation Using the R Packages EdSurvey and Dire

Emmanuel Sikali, Paul Bailey, & Ting Zhang


Full day short course (Monday, July 11; 10:00AM-5:30PM)

Large-scale assessments, such as the National Assessment of Educational Progress (NAEP) and the Trends in International Mathematics and Science Study (TIMSS), are valuable for researchers seeking to understand what students know and can do in various subject areas at the jurisdiction or national level. These assessments employ special sampling and assessment designs, including the use of matrix booklets, to provide comprehensive coverage of each subject domain while keeping the burden of test taking low. As a downside, these sampling designs increase the complexity of estimating student performance. Using traditional psychometric techniques, such as item response theory models, for generating student proficiency scores from large-scale assessments leads to biased variance estimates of population parameters. With advances in theory and computation, two modern approaches have been deemed appropriate for the analysis of large-scale assessment data: (1) analysis with plausible values (Mislevy et al., 1992); and (2) using maximum likelihood estimates of reporting group difference parameters with direct estimation (Cohen & Jiang, 1999).

This full-day course will provide participants with theoretical knowledge and hands-on training in analyzing large-scale assessment data through both the plausible values approach and the direct estimation approach using the R packages EdSurvey and Dire. In the theoretical section of the course, instructors will introduce large-scale assessment matrix sampling design, complex sampling methods, and data analysis strategies, including the plausible value and direct estimation approaches to computing scale scores with appropriate weighting and variance estimation procedures. In the hands-on section that follows, participants will learn how to use the EdSurvey and Dire packages for

  • data processing, merging, and manipulation;
  • descriptive statistics;
  • plausible values generation;
  • regression analysis with the plausible values approach; and
  • regression analysis with the direct estimation approach.

Participants will learn how these two analytic approaches differ and when each approach might be preferable.

Participants will be provided with a TIMSS data file and a mini-sample public-use NAEP data file to use in their analyses. For researchers who are interested in using the direct estimation approach with other assessment data, the instructor will showcase its potential with an external data source.


Cohen, J.D., and Jiang, T. (1999). Comparison of partially measured latent traits across nominal subgroups. Journal of the American Statistical Association, 94(448), 1035–1044.

Mislevy, R.J., Beaton, A., Kaplan, B.A., and Sheehan, K. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29(2), 133–161.

About the Instructors

Emmanuel Sikali

Dr. Emmanuel Sikali is the acting chief of the Reporting and Dissemination Branch of the National Center for Education Statistics (NCES) inside the Institute of Education Science (IES) at the U.S. Department of Education, where he is responsible for research and development. Among his responsibilities, Dr. Sikali is the project officer for the development of EdSurvey, a suite of R packages, including Dire, that analyzes data from NCES assessments, such as the National Assessment of Educational Progress (NAEP), and international assessments, such as the Trends in Mathematics and Science Study (TIMSS). Dr. Sikali has been with NCES since 2005 and earned a Ph.D. in information technology and engineering from George Mason University in Fairfax, Virginia.

Paul Bailey

Dr. Paul Bailey is a senior researcher at the American Institutes for Research (AIR), headquartered in the United States. Dr. Bailey is the lead developer of EdSurvey and Dire and has developed several R packages that, collectively, are downloaded more than a thousand times per month. He has also worked in the areas of labor economics and the return to the Post-9/11 GI Bill, econometrics for errors-in-variables models, and value-added modeling. Dr. Bailey has been at AIR since 2012 and has a master’s degree in statistics from the University of Chicago and a Ph.D. in economics from the University of Maryland.

Ting Zhang

Dr. Ting Zhang is a senior researcher at AIR. She serves as project director for the development of R packages and tools, including EdSurvey and Dire, and has provided 9 years of technical support, in the form of research, reporting, and technical review, to NAEP. She has led more than 10 trainings on the use of these R tools to analyze NAEP and TIMSS data at various national and international conferences. Dr. Zhang has been at AIR since 2013; she earned a Ph.D. in human development and quantitative methods from the University of Maryland. Her research interests focus on complex survey design and measurement validity.

Log in