TITLE: Causal Inference and Data Fusion
INSTRUCTOR: Elias Bareinboim and Adele Ribeiro, Columbia University, Dr  Mohammad Adibuzzaman, Purdue University


Causal inference is usually dichotomized into two categories, experimental (Fisher, Cox, Cochran) and observational (Neyman, Rubin, Robins, Dawid, Pearl) which, by and large, are studied separately. Experimental and observational studies are but two extremes of a rich spectrum of research designs that generate the bulk of the data available in practical, large-scale situations. In typical medical explorations, for example, data from multiple observations and experiments are collected, coming from distinct experimental setups, different sampling conditions, and heterogeneous populations.

In this short course, we will discuss the data-fusion problem, which is concerned with piecing together multiple datasets collected under heterogeneous conditions so as to obtain valid answers to causal queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to causal analysts since the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. We will present our general non-parametric framework for handling these biases and, ultimately, a theoretical solution to the problem of fusion in causal inference tasks.


  1. Bareinboim, E., & Pearl, J. (2016). Causal inference and the data-fusion
    Proceedings of the National Academy of Sciences, 113(27), 7345-7352.
  1. Pearl, J., & Mackenzie, D. (2018). The book of why: the new science of cause and Basic Books.
  1. Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal inference in statistics: A John Wiley & Sons.

Instructors’ Biography:


Elias Bareinboim is an Associate Professor in the Department of Computer Science and Director of the Causal Artificial Intelligence Lab at Columbia University, New York. Before joining Columbia, he worked as an Assistant Professor at the Department of Computer Science at Purdue University. He obtained his Ph.D. from the University of California at Los Angeles (UCLA) under the supervision of Professor Judea Pearl, where he also did his post-doctoral fellowship. He is a recipient of the prestigious NSF Career Award. His research area is in the domain of artificial intelligence, more specifically in causal inference. Building on the modern

language of causation emerged in the last decades, his work develops a theoretical framework for understanding, representing, and algorithmizing causal generalizations from a heterogeneous mixture of observational and experimental studies.


Adèle H. Ribeiro is a Postdoctoral Researcher in the Causal Artificial Intelligence Lab at Columbia University. Her research focuses on developing the emergent field of Causal Health Sciences. She holds a Bachelor’s degree in Computational and Applied Mathematics (2012) and Master’s and Ph.D. degrees in Computer Science (2014 and 2018, respectively), all from the Institute of Mathematics and Statistics of the University of São Paulo (USP), Brazil. She also undertook a doctoral research internship in the Developmental Neuromechanics and Communication Lab at Princeton University. Previously, she worked as a Postdoctoral Fellow in the Laboratory of Genetics and Molecular Cardiology at the Heart Institute in USP.



Mohammad Adibuzzaman (PhD in Computational Sciences from Marquette University, Milwaukee, Wisconsin). Dr. Adibuzzaman is the Assistant Director of Data and Computing at the Regenstrief Center for Healthcare Engineering (RCHE) located at Purdue University, Indiana. At Regenstrief, Dr Adibuzzaman leads the research infrastructure for data analysis and have established numerous institutional partnership such as the Laboratory for Computational Physiology at the MIT led by Roger Mark, and industry partnership for state of the art distributed database technology with Paradigm4, founded by Turing Award recipient MIT Computer Scientist Mike Stonebroker, and many intra university computing partnerships at the intersection of data science, computer science, and life sciences. He also is leading a new line of research at RCHE for explainable Artificial Intelligence (AI) in health sciences focusing on causal inference methods. The goal is to introduce new methods of explainable AI in clinical research. He maintains all of RCHEs data and computing assets with a vision for RCHE to become a collaborative hub at the intersection of technology and health science to improve health outcomes

This entry was posted in . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *