Title: Regression Tree Analysis of Messy Data
Instructors: Wei-Yin Loh, University of Wisconsin, Madison
Moderator: Din Chen
Abstract:
Since the first regression tree algorithm was published in JASA more than 60 years ago (Morgan & Songuist, 1963), many newer algorithms have appeared, with significant improvements in capability, power, and speed. Theoretical foundations, such as asymptotic consistency, have also been established. As a result, modern tree algorithms are often the preferred “low-level learners” in ensemble methods such as forests and gradient boosting machines.
This course is a broad overview of the current state of the art for both newcomers and experienced users. No prior knowledge of the subject is required. Real datasets are used to motivate, illustrate, and compare algorithms (including CART, RPART, CTREE, randomForest, Ranger, Cforest, and GUIDE) on their strengths (prediction accuracy, variable selection, and importance ranking), weaknesses (over and underfitting, selection biases), and features (applicability to missing data, longitudinal, multivariate, and censored responses, subgroup identification, propensity score estimation, circular or periodic predictor variables, such as angles and time of day, and regression trees with linear splits and linear fits, and applications to explainable AI). There will be demos of free software, if time permits.
Instructor’s Biography
Wei-Yin Loh is Professor of Statistics at the University of Wisconsin, Madison. His research interests are in bootstrap theory and methodology and algorithms for classification and regression trees. Loh is a fellow of the American Statistical Association and the Institute of Mathematical Statistics, and a consultant to government and industry. He is a recipient of the Reynolds Award for teaching, the U.S. Army Wilks Award for statistics research and application, an Outstanding Science Alumni Award from the National University of Singapore, and visiting fellowships from AbbVie, IBM and the Bureau of Labor Statistics.