This is a
brief essay noting down ideas how we build up ideas on causal relationship and
how we use these ideas to carry out experiments and perform statistical and
model based analyses of the results. It largely reflects my understanding of
the Book of Why.(Pearl and Mackenzie, 2018)
During our
life we all build up knowledge, ideas and believes on how the world around us functions.
We obtain that knowledge by observing what happens around us, what happens when
we do something, and by listening to others. Aiming at how this knowledge,
ideas and believes are going to be used here subsequently, I refer to them as
models of reality. We do not have only a single model, but we all have a whole
zoo of them, also referred to as a casual salad in (McElreath, 2020). And they are not only our
models, but we share many of them across society. As beautifully explained in (Harari, 2015), these shared models are a central
aspect shaping societies.
The zoo of
models evolves over time. It evolves as each of us grows up and acquires skills
and knowledge and as we gain experience in life. It evolves also as societies
and humankind accumulate knowledge and as habits and fashions change.
When doing
experiments or analyzing data and, in particular, if we have to make decisions,
we inevitably match what we find with our zoo of models, i.e., we do some causal
inference. Based on the findings, we may be able to improve our models, we may
have to question some of them, or perhaps come up with new ones, or protect our
zoo by adjusting our perception of reality and our memories. This process may
be done with varying rigor at various levels including, in particular, at the
level of the planning and execution of experiments, the analysis of the data,
and the matching of the data with our models. Some examples follow.
The child: By letting fall things and
throwing them, a child will learn that things fall down. Here no formal
planning, execution or analysis is involved, and the causal interpretation is
taken care of by our intuition. This and subsequent examples are causal
question since they answers what happens, if we do something.
High
school experiment: Knowing
that things fall down, we would like to know how long stones of different sizes
take to fall down. We device the experiment to find this out. We measure
falling times. We graph them, and we draw our conclusions. Here, the causal
question comes first, the art is then to set up the experiment adequately, and an
informal analysis of the data suffices to answer our question.
Statistical
experiment: This is
similar to the previous examples, except that variability becomes important.
For example, once we know how long it take for things to fall down, and that
this seems to be similar for objects of different weights, we would like to
find out whether these times are exactly the same for a stone and a piece of
wood. We measure again the times it takes to reach ground. But this time, we
need to measure repeatedly to be able to come to conclusions. And we probably
want to use statistical hypothesis testing to express how sure we are about our
finding whether it takes the same time for both objects, or not. Here, again
the causal question comes in the beginning, the art is to set up the experiment
adequately. In contrast to the previous example, augmenting the analysis with
statistical hypothesis tests helps to express and communicate our findings.
Randomized
trial: When
studying heterogeneous populations, we need to ensure that the subjects which
we are studying are representative of the population. Or in other words, we
need to avoid that our selection process affects and confounds the results. One
robust possibility is to use randomization and select subjects randomly. Thus
again, to be able to come to causal conclusions, we must set up our experiment appropriately
with a good understanding of the causal question, and then can make an analysis
taking advantage of the setup to get answers.
Observational
study: When it is
not possible to execute experiments that ensure causal interpretation of the results,
we need to use our zoo of models or rather a selected model from our zoo to analyze
the data that we obtain.
The
doctor: The doctor needs
to identify the best treatment for a given patient. This will be based on
results from randomized clinical trials but needs to take into account that
patient are different and that different alternative treatment options exist. Here,
the doctor needs to use his zoo of models to evaluate the different options and
come up with a recommendation. This example is to say that even, if it was
possible to separate the experiment from the causal model using randomization, results
must be integrated back into models to come up with decisions.
The
black swan: The
example of the “The doctor” may suggest that trying to avoid using causal
models for the analysis of data is a lost effort, since in many situations we
will have to use the models anyway. The counterexample is our intrinsic urge to
maintain our zoo of models intact, independent of what happens. This is nicely described
by (Taleb, 2007) and constitute an important
argument to execute careful experiments and analyses whenever possible.
Coming back
when doing experiments and analyzing the data obtained, we inevitably match
what we find with our zoo of models. Usually, it is good practice to separate
out the model from the experiment and its analysis, i.e., to use the model zoo only
to give the question and to design the experiment, in a way that the analysis
can be done without model and the result can be used to feed the zoo. In some
cases, questions can only be answered by using some (causal) model as part of
the analysis. In any case, when it comes to the point of making decisions, we
have to use our zoo of models.
References
Harari, Y.N., 2015. Sapiens: A Brief History of Humankind. Harper Collins Publishers, New York, NY.
McElreath, R., 2020. Statistical Rethinking, Statistical Rethinking. https://doi.org/10.1201/9780429029608
Pearl, J., Mackenzie, D., 2018. The Book of Why.
Taleb, N.N., 2007. The Black Swan: The Impact of the Highly Improbable (Random House, 2007).