Machine Learning and Interpretable Machine Learning with R
Carolin Strobl, Mirka Henninger, & Yannick Rothacher
Full day short course (Monday, July 11; 10:00AM-5:30PM)
The aim of this course is to provide the audience with a general introduction to machine learning (ML) techniques and principles, including means for enhancing the interpretability of ML results.
We will start with a “grand tour” of supervised and unsupervised ML, touching upon a variety of ML methods, such as k-means and hierarchical clustering, k-NN, and support vector machines, as well as important concepts and terminology, such as “black box” models, cross validation and parameter tuning.
In the next part of the course we will introduce two of the most widely used families of ML methods in more detail: the ensemble methods bagging, random forests, and boosting, including their construction principles and properties, as well as neural networks, focusing on single hidden-layer, feed-forward networks, the role of activation functions, parameter tuning, and the related dangers of over- and underfitting.
Using these methods for illustration, we will further present several graphical and numeric techniques from the field of Interpretable Machine Learning (IML) that allow us to assess the importance and shape of the effect of the predictor variables. Besides presenting the techniques, we will also discuss potential caveats and risks of misinterpretation.
The course lectures will be interspersed with practical exercises, in which participants learn how to apply the presented techniques in the free, open-source software R. The participants will receive detailed instructions on how to install the free software before the course. Previous experience with R is a plus, but the course materials and presenters are prepared in a way that makes it possible to follow even for R novices.
At the end of the course participants will understand key principles of ML and IML, be able to apply several widely used machine learning methods in R, know where to be careful not to mis- or overinterpret results, and be able to judge if and how machine learning could contribute to their own research.
About the Instructors
Carolin Strobl
Carolin Strobl is
professor for Psychological Methods at the University of Zurich
(UZH), Switzerland, where her group has hosted the 2017 IMPS
conference. She has degrees in psychology and statistics and
graduated from the Ludwig-Maximilians-University of Munich (LMU),
Germany, with a PhD and Habilitation in Statistics. She has been
actively developing reliable and interpretable machine learning
methods and promoting their application in psychology for over 15
years. Carolin and her group have contributed to several software
packages related to machine learning and psychometrics in the
free, open source software R, and have broad experience teaching
statistics and machine learning with R in BA, MA and PhD study
programs as well as in their postgraduate and professional
training program, the Zurich R courses.
Mirka Henninger
Mirka Henninger is a
postdoctoral researcher at the chair for Psychological Methods at
the University of Zurich (UZH), Switzerland. She has a background
in psychological methods, with her dissertation at the University
of Mannheim, Germany, developing and examining psychometric
modeling techniques of response biases in polytomous rating data.
As one strand of her broad research she works on tree based
methods and interpretable machine learning and is currently
leading research projects on global and local interpretation
techniques with a focus on their behavior under predictor
correlation and their capacity to detect interaction effects.
Mirka has extensive teaching experience with novice and advanced
R users at all levels and is part of the statistical consulting
unit at the UZH Department of Psychology.
Yannick Rothacher
Yannick Rothacher is
a postdoctoral researcher at the chair for Psychological Methods
at the University of Zurich (UZH), Switzerland. He has a
background in neuroscience and statistics. Parallel to his
dissertation in collaboration between the Neuropsychology unit of
the University Hospital of Zurich and the Innovation Center
Virtual Reality of the ETH Zurich, he completed a further
training program in applied Statistics. His current research
interests include automated variable selection with random
forests and the correct interpretation of machine learning
results with applications in psychology and linguistics. Yannick
has extensive experience in teaching machine learning to R users
at all levels and is part of the statistical consulting unit at
the UZH Department of Psychology.