TITLE: Causal Inference and Data Fusion
INSTRUCTOR: Elias Bareinboim and Adele Ribeiro, Columbia University


Causal inference is usually dichotomized into two categories, experimental (Fisher, Cox, Cochran) and observational (Neyman, Rubin, Robins, Dawid, Pearl) which, by and large, are studied separately. Experimental and observational studies are but two extremes of a rich spectrum of research designs that generate the bulk of the data available in practical, large-scale situations. In typical medical explorations, for example, data from multiple observations and experiments are collected, coming from distinct experimental setups, different sampling conditions, and heterogeneous populations.

In this short course, we will discuss the data-fusion problem, which is concerned with piecing together multiple datasets collected under heterogeneous conditions so as to obtain valid answers to causal queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to causal analysts since the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. We will present our general non-parametric framework for handling these biases and, ultimately, a theoretical solution to the problem of fusion in causal inference tasks.


  1. Bareinboim, E., & Pearl, J. (2016). Causal inference and the data-fusion
    Proceedings of the National Academy of Sciences, 113(27), 7345-7352.
  1. Pearl, J., & Mackenzie, D. (2018). The book of why: the new science of cause and Basic Books.
  1. Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal inference in statistics: A John Wiley & Sons.

Instructors’ Biography:

Elias Bareinboim is an Associate Professor in the Department of Computer Science and Director of the Causal Artificial Intelligence Lab at Columbia University, New York. Before joining Columbia, he worked as an Assistant Professor at the Department of Computer Science at Purdue University. He obtained his Ph.D. from the University of California at Los Angeles (UCLA) under the supervision of Professor Judea Pearl, where he also did his post-doctoral fellowship. He is a recipient of the prestigious NSF Career Award. His research area is in the domain of artificial intelligence, more specifically in causal inference. Building on the modern

language of causation emerged in the last decades, his work develops a theoretical framework for understanding, representing, and algorithmizing causal generalizations from a heterogeneous mixture of observational and experimental studies.


Adèle H. Ribeiro is a Postdoctoral Researcher in the Causal Artificial Intelligence Lab at Columbia University. Her research focuses on developing the emergent field of Causal Health Sciences. She holds a Bachelor’s degree in Computational and Applied Mathematics (2012) and Master’s and Ph.D. degrees in Computer Science (2014 and 2018, respectively), all from the Institute of Mathematics and Statistics of the University of São Paulo (USP), Brazil. She also undertook a doctoral research internship in the Developmental Neuromechanics and Communication Lab at Princeton University. Previously, she worked as a Postdoctoral Fellow in the Laboratory of Genetics and

Molecular Cardiology at the Heart Institute in USP.

This entry was posted in . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *