Machine Learning and Interpretable Machine Learning with R
Carolin Strobl, Mirka Henninger, & Yannick Rothacher
Full day short course (Monday, July 11; 10:00AM-5:30PM)
The aim of this course is to provide the audience with a general introduction to machine learning (ML) techniques and principles, including means for enhancing the interpretability of ML results.
We will start with a “grand tour” of supervised and unsupervised ML, touching upon a variety of ML methods, such as k-means and hierarchical clustering, k-NN, and support vector machines, as well as important concepts and terminology, such as “black box” models, cross validation and parameter tuning.
In the next part of the course we will introduce two of the most widely used families of ML methods in more detail: the ensemble methods bagging, random forests, and boosting, including their construction principles and properties, as well as neural networks, focusing on single hidden-layer, feed-forward networks, the role of activation functions, parameter tuning, and the related dangers of over- and underfitting.
Using these methods for illustration, we will further present several graphical and numeric techniques from the field of Interpretable Machine Learning (IML) that allow us to assess the importance and shape of the effect of the predictor variables. Besides presenting the techniques, we will also discuss potential caveats and risks of misinterpretation.
The course lectures will be interspersed with practical exercises, in which participants learn how to apply the presented techniques in the free, open-source software R. The participants will receive detailed instructions on how to install the free software before the course. Previous experience with R is a plus, but the course materials and presenters are prepared in a way that makes it possible to follow even for R novices.
At the end of the course participants will understand key principles of ML and IML, be able to apply several widely used machine learning methods in R, know where to be careful not to mis- or overinterpret results, and be able to judge if and how machine learning could contribute to their own research.
About the Instructors
Carolin Strobl is professor for Psychological Methods at the University of Zurich (UZH), Switzerland, where her group has hosted the 2017 IMPS conference. She has degrees in psychology and statistics and graduated from the Ludwig-Maximilians-University of Munich (LMU), Germany, with a PhD and Habilitation in Statistics. She has been actively developing reliable and interpretable machine learning methods and promoting their application in psychology for over 15 years. Carolin and her group have contributed to several software packages related to machine learning and psychometrics in the free, open source software R, and have broad experience teaching statistics and machine learning with R in BA, MA and PhD study programs as well as in their postgraduate and professional training program, the Zurich R courses.
Mirka Henninger is a postdoctoral researcher at the chair for Psychological Methods at the University of Zurich (UZH), Switzerland. She has a background in psychological methods, with her dissertation at the University of Mannheim, Germany, developing and examining psychometric modeling techniques of response biases in polytomous rating data. As one strand of her broad research she works on tree based methods and interpretable machine learning and is currently leading research projects on global and local interpretation techniques with a focus on their behavior under predictor correlation and their capacity to detect interaction effects. Mirka has extensive teaching experience with novice and advanced R users at all levels and is part of the statistical consulting unit at the UZH Department of Psychology.
Yannick Rothacher is a postdoctoral researcher at the chair for Psychological Methods at the University of Zurich (UZH), Switzerland. He has a background in neuroscience and statistics. Parallel to his dissertation in collaboration between the Neuropsychology unit of the University Hospital of Zurich and the Innovation Center Virtual Reality of the ETH Zurich, he completed a further training program in applied Statistics. His current research interests include automated variable selection with random forests and the correct interpretation of machine learning results with applications in psychology and linguistics. Yannick has extensive experience in teaching machine learning to R users at all levels and is part of the statistical consulting unit at the UZH Department of Psychology.