Mokken scale analysis is a scaling method that consists of two parts: An automated item selection procedure for constructing scales and a series of goodness of fit methods for nonparametric item response theory models. In this talk, I will discuss two recent developments in Mokken scale analysis: First, Mokken’s scalability coefficients and their standard errors have been generalized to multilevel test data. Second, the automated item selection procedure now takes the sampling variability of the scalability coefficients into account. As a result the algorithm is more conservative for small samples. These advances, implemented in the R packages mokken 3.0, have resulted in a best practice strategy for scaling multilevel test data. I will illustrate the strategy using a real-data example, and explain the rationale of the strategy.