Marjolein Fokkema, Leiden University
Prediction rule ensembles: Balancing interpretability and accuracy in statistical prediction
Prediction Rule Ensembles (PREs) are a relatively new statistical learning method. PREs aim to strike a balance between the high predictive accuracy of decision tree ensembles (e.g., random forests, boosted tree ensembles), and the ease of interpretability of sparse regression methods and single decision trees. While PREs have been shown to provide predictive accuracy competitive with random forests, PREs generally consist of a small number of rules and thus may be easier to interpret for human decision makers. As such, PREs may contribute to bridging the gap between clinical research and practice in psychology. This presentation introduces PRE methodology, starting from the algorithm originally proposed by Friedman and Popescu (2008). The potential of PRE methodology will be illustrated through several applications to psychological research data, for example on the prediction of academic achievement and chronic depression. The applications illustrate the functionality of the R package ‘pre’, which implements the algorithm originally proposed by Friedman and Popescu. The package ‘pre’ provides several potential advantages over the original implementation, like the use of an unbiased rule induction algorithm, and support for a wider range of response variables (i.e., continuous, categorical, count, multivariate and survival). Several topics that may be particularly relevant for psychological research will be discussed, like including rules that are known a-priori to be relevant for predicting the response, the application of (non-) negativity constraints, and the analysis of multilevel data.
ABOUT THE SPEAKER
Marjolein Fokkema obtained her PhD at the Vrije Universiteit Amsterdam and currently works as an assistant professor at Leiden University. Her main research interest is in decision-tree methods, because these provide intuitive tools for evidence-based decision making. She has developed decision-tree methods for adaptive testing and analyzing multilevel data, and (co-)authored several R packages. Her current research focusses on balancing accuracy and interpretability in statistical prediction, and decision-tree methods for growth curve models.