TITLE: Prediction In Event-Based Trials
SPEAKERS: Professor Daniel Heitjan, SMU
Kalyan Ghosh


The widespread adoption of electronic health records (EHR) as a means of documenting medical care has created a vast resource for the study of health conditions, interventions, and outcomes in routine clinical practice. Using EHR data for research facilitates the efficient creation of large research databases, execution of pragmatic clinical trials, and study of rare diseases. Two major uses of EHR data for medical research are: 1) identifying disease risk factors for early detection or prevention of negative health outcomes; and 2) comparative effectiveness research (CER), or “comparing different interventions and strategies to prevent, diagnose, treat, and monitor health conditions.” However, because EHRs were not designed for research purposes, missing, inconsistent and error-prone data are ubiquitous. Data are only observed when a patient interacts with the healthcare system, resulting in complex data generating mechanisms. Moreover, medical records coding was not designed with research purposes in mind and much of the detail about a patient’s health is contained in unstructured text notes. As a result, even when a health condition or risk factor is documented, extracting this information from the EHR is challenging and results in imperfect information on outcomes, exposures, and confounders. These special features of EHRs present great challenges that must be accounted for when analyzing EHR-derived data.

In this short course, we will discuss topics related to the design and analysis of research studies using EHR data. We will first cover issues related to the structure and quality of EHR data, including data types and methods for extracting variables of interest; sources of missing data; error in covariates and outcomes extracted from EHR data; and data capture considerations such as informative visit processes and medical records coding procedures. Next, we will discuss statistical methods that mitigate some of these issues, including missing data and error in EHR-derived covariates and outcomes. We will also discuss cutting-edge methods developed to address unique challenges in the EHR context such as privacy-preserving computation in the context of distributed research networks. Finally, after an introduction to CER and causal inference, we will discuss challenges arising when conducting CER with EHR data, including potential model misspecification, time-varying treatment and post-treatment confounding. We conclude with a discussion of methods to mitigate these challenges.

In this tutorial, the presenters will share their hands-on experience with conducting two EHR-based research studies: identifying risk factors for paediatric type II diabetes and comparing the effectiveness of two alternative approaches for treating children with juvenile idiopathic arthritis. The learning objects of this tutorial are to: 1) identify challenges inherent in conducting medical research using EHRs; 2) present methods and tools that can be used to address these challenges; and 3) briefly introduce some recent methodological developments.

Speaker Bio

Daniel F. Heitjan (PhD in Statistics from The University of Chicago) is a Professor of Statistical Science at Southern Methodist University and Professor of Population & Data Sciences at UT Southwestern Medical Center, both in Dallas, TX. His research interests include clinical trial design and analysis, incomplete data, modelling cancer survivorship, and statistical methods in health economics. Dr. Heitjan is a fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the Society for Clinical Trials.

This entry was posted in . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *