# Statistical Learning

In non-randomized and/or explorative studies, researchers often have to deal with large numbers of variables. When the aim is to build a statistical model from these variables (e.g. for diagnosis or prognosis), it is usually not possible to include all of them in the model. For this reason, variable selection is a key issue in statistical model building (Sauerbrei, Bommert…). During the past years, statistical learning techniques (also termed machine learning techniques) have been successful in dealing with large numbers of variables, yielding improved predictions even when sample sizes are comparably small. On the other hand, the improved prediction accuracy often comes at the price of a limited interpretability, making it hard to infer the characteristics of the predictor-response relationships (so-called black-box models). As the explanation of the predictor-response relationships is usually important to our clinical collaborators, we are interested in the development of statistical learning methods that bridge the gap between prediction accuracy and interpretability. For example, we are active in the improvement of gradient boosting algorithms, which can be modified such that they produce model fits having same structure as standard fits obtained from linear or logistic regression (…). Furthermore, we are interested in the development and application of explainable AI methods, which can be used to increase the interpretability of black-box methods (like random forests or tree boosting) by post-processing the respective predictions (…).