TITLE: Risk Factor Identification & Comparative Effectiveness Research Using Electronic Health Records: Challenges, Analytical Strategies & Recent Developments

SPEAKERS: Drs. Rebecca Hubbard, University of Pennsylvania
Dr. Yong Chen, University of Pennsylvania
Dr. Bin Huang, Cincinnati Children’s Hospital Medical Center


The widespread adoption of electronic health records (EHR) as a means of documenting medical care has created a vast resource for the study of health conditions, interventions, and outcomes in routine clinical practice. Using EHR data for research facilitates the efficient creation of large research databases, execution of pragmatic clinical trials, and study of rare diseases. Two major uses of EHR data for medical research are: 1) identifying disease risk factors for early detection or prevention of negative health outcomes; and 2) comparative effectiveness research (CER), or “comparing different interventions and strategies to prevent, diagnose, treat, and monitor health conditions.” However, because EHRs were not designed for research purposes, missing, inconsistent and error-prone data are ubiquitous. Data are only observed when a patient interacts with the healthcare system, resulting in complex data generating mechanisms. Moreover, medical records coding was not designed with research purposes in mind and much of the detail about a patient’s health is contained in unstructured text notes. As a result, even when a health condition or risk factor is documented, extracting this information from the EHR is challenging and results in imperfect information on outcomes, exposures, and confounders. These special features of EHRs present great challenges that must be accounted for when analyzing EHR-derived data.

In this short course, we will discuss topics related to the design and analysis of research studies using EHR data. We will first cover issues related to the structure and quality of EHR data, including data types and methods for extracting variables of interest; sources of missing data; error in covariates and outcomes extracted from EHR data; and data capture considerations such as informative visit processes and medical records coding procedures. Next, we will discuss statistical methods that mitigate some of these issues, including missing data and error in EHR-derived covariates and outcomes. We will also discuss cutting-edge methods developed to address unique challenges in the EHR context such as privacy-preserving computation in the context of distributed research networks. Finally, after an introduction to CER and causal inference, we will discuss challenges arising when conducting CER with EHR data, including potential model misspecification, time-varying treatment and post-treatment confounding. We conclude with a discussion of methods to mitigate these challenges.

In this tutorial, the presenters will share their hands-on experience with conducting two EHR-based research studies: identifying risk factors for pediatric type II diabetes and comparing the effectiveness of two alternative approaches for treating children with juvenile idiopathic arthritis. The learning objects of this tutorial are to: 1) identify challenges inherent in conducting medical research using EHRs; 2) present methods and tools that can be used to address these challenges; and 3) briefly introduce some recent methodological developments.

Speaker Bio

Dr. Chen is an Assistant Professor of Biostatistics in the Department of Biostatistics, Epidemiology and Informatics at the University of Pennsylvania. He is also a Senior Fellow at the Institute of Biomedical Informatics, at Penn School of Medicine,  a Senior Scholar at the Center for Evidence-based Practice at Penn School of Medicine, and a faculty member at the Applied Mathematics & Computational Science Program, Penn Arts & Sciences. He has been working on applied statistics, biomedical informatics, bioinformatics, and evidence-based medicine. His main research interests are bias reduction methods in electronic medical records, dynamic risk prediction, pharmacovigilance, personalized health management strategies using data-driven approaches, integration of heterogeneous data sources, and evidence synthesis. He is an Elected Member of the Society for Research Synthesis Methodology. He has served as a principal investigator or co-Investigator of more than 20 projects, funded by NIH, PCORI and AHRQ. He is currently the PI of two NIH funded projects with $4.7 million US dollars, and has published more than 70 papers in the statistical and biomedical journals. He has taught short courses on comparative effectiveness research, network meta-analysis, and statistical methods for electronic health records data at ASA Biopharmaceutical Section, ICSA and ENAR annual conferences.

This entry was posted in . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *