SHORT COURSE 2

TITLE: Multiple Imputation in Practice
INSTRUCTOR: Stef van Buuren and Gerko Vink, University of Utrecht, Holland
MODERATOR: Alfred H. Balch

Abstract: Most data analytic procedures are designed for complete data and simply ignore any incomplete rows in the data, or use ad-hoc procedures, like replacing the missing data by the “best value” or by last-observation-carried-forward (LOCF). However, such procedures for fixing the missing data may introduce serious biases in the ensuing statistical analysis. Multiple imputation (MI) is a principled solution for this problem. With MI, each missing datum is imputed m ≥ 2 times, resulting in m completed datasets. At least 2 imputations are warranted to reflect the uncertainty about the imputations, although performing more imputations is often advisable. The mdatasets are then analysed by standard procedures and the analyses are combined into a single inference. MI has become popular and is easily one of the most utilised methods for dealing with nonresponse in many domains of statistics. A possible explanation for MI’s popularity hasto do with separating the missing data problem from the analysis stage. As a result, inference using MI is relatively straightforward to obtain and easy to comprehend, properties that may be particularly appealing to applied researchers. With multiple imputation, incomplete data complexities are mostly applicable to the imputation stage, making the analysis stage relatively straightforward. In other words, once satisfactory imputations are obtained it is not too difficult to obtain the substantive model ofinterest. The course enhances the participants’ knowledge in imputation methodology using R. It explains the principles of missing data theory, outlines a step-by-step approach toward creating high quality imputations, helps participants avoid the pitfalls that plague inexperienced imputes, and provides guidelines on how to report the results. The course builds upon the author’s popular R package MICE. Familiarity is required with basic statistical concepts and techniques (such as regression) and the concept of statistical inference. This course will emphasise computational techniques, but no prior programming experience with R is needed.

Course schedule:
Day 1
8.00 am Lecture 1
Introduction to incomplete data theory and univariate imputation
9.50 am Practical MICE in R: Convergence, imputation methods, predictor selection and ignoring records
12.40 pm Lecture 2 Multivariate imputation and pooling
2.30 pm Practical MICE in R: Statistical analyses and pooling
4.20 pm Lecture 3 Bring your own problem

Day 2 8.00 am Lecture 1 Sensitivity analyses and longitudinal data imputation
9.50 am Practical MICE in R: Statistical learning and diagnostic checking
11.40 pm Lecture 2 Capita Selecta and Causal Inference

Dr. Stef van Buuren is Professor of Statistical Analysis of Incomplete Data at theUniversity of Utrecht and Principal Scientist at theNetherlands Organization for Applied Scientific Research TNOin Leiden. His interests include the analysis of incomplete data, child growth and development, computational statistics, measurement and individual causal effects. Van Buuren is the inventor of theMICE algorithmfor multiple imputation of missing data. He created the growth charts used in the Dutch child health care system, and designed theDscore, a new system for expressing child development on a quantitative scale. He consults for the World Health Organization and the Bill & Melinda Gates Foundation. Dr. Gerko Vink is a statistician masquerading as a data scientist with a passion for educating people. He aims to be at the cutting edge of both teaching and research and has an interest in new developments concerning the presentation of data, results and knowledge. Gerko is Associate Professor of Applied Data Science at Utrecht University (Utrecht, Netherlands) where his research and teaching focuses on incomplete data problems, computational evaluation and programming

Instructors’ Biography:

Stef Stef van Buuren is Professor of Statistical Analysis of Incomplete Data at the University of Utrecht and Principal Scientist at the Netherlands Organisation for Applied Scientific Research TNO in Leiden. His interests include the analysis of incomplete data, child growth and development, computational statistics, measurement and individual causal effects. Van Buuren is the inventor of the MICE algorithm for multiple imputation of missing data. He created the growth charts used in the Dutch child health care system, and designed the D-score, a new system for expressing child development on a quantitative scale. He consults for the World Health Organization and the Bill & Melinda Gates Foundation.

 

Gerko Vink is a statistician masquerading as a data scientist with a passion for educating people. He aims to be at the cutting edge of both teaching and research and has an interest in new developments concerning the presentation of data, results and knowledge. Gerko is Associate Professor of Applied Data Science at Utrecht University (Utrecht, Netherlands) where his research and teaching focuses on incomplete data problems, computational evaluation and programming.

 

This entry was posted in . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *