Data Analysis: Causal salad or a zoo of causal models

This is a brief essay noting down ideas how we build up ideas on causal relationship and how we use these ideas to carry out experiments and perform statistical and model based analyses of the results. It largely reflects my understanding of the Book of Why.(Pearl and Mackenzie, 2018)

During our life we all build up knowledge, ideas and believes on how the world around us functions. We obtain that knowledge by observing what happens around us, what happens when we do something, and by listening to others. Aiming at how this knowledge, ideas and believes are going to be used here subsequently, I refer to them as models of reality. We do not have only a single model, but we all have a whole zoo of them, also referred to as a casual salad in (McElreath, 2020). And they are not only our models, but we share many of them across society. As beautifully explained in (Harari, 2015), these shared models are a central aspect shaping societies.

The zoo of models evolves over time. It evolves as each of us grows up and acquires skills and knowledge and as we gain experience in life. It evolves also as societies and humankind accumulate knowledge and as habits and fashions change.

When doing experiments or analyzing data and, in particular, if we have to make decisions, we inevitably match what we find with our zoo of models, i.e., we do some causal inference. Based on the findings, we may be able to improve our models, we may have to question some of them, or perhaps come up with new ones, or protect our zoo by adjusting our perception of reality and our memories. This process may be done with varying rigor at various levels including, in particular, at the level of the planning and execution of experiments, the analysis of the data, and the matching of the data with our models. Some examples follow.

The child: By letting fall things and throwing them, a child will learn that things fall down. Here no formal planning, execution or analysis is involved, and the causal interpretation is taken care of by our intuition. This and subsequent examples are causal question since they answers what happens, if we do something.

High school experiment: Knowing that things fall down, we would like to know how long stones of different sizes take to fall down. We device the experiment to find this out. We measure falling times. We graph them, and we draw our conclusions. Here, the causal question comes first, the art is then to set up the experiment adequately, and an informal analysis of the data suffices to answer our question.

Statistical experiment: This is similar to the previous examples, except that variability becomes important. For example, once we know how long it take for things to fall down, and that this seems to be similar for objects of different weights, we would like to find out whether these times are exactly the same for a stone and a piece of wood. We measure again the times it takes to reach ground. But this time, we need to measure repeatedly to be able to come to conclusions. And we probably want to use statistical hypothesis testing to express how sure we are about our finding whether it takes the same time for both objects, or not. Here, again the causal question comes in the beginning, the art is to set up the experiment adequately. In contrast to the previous example, augmenting the analysis with statistical hypothesis tests helps to express and communicate our findings.

Randomized trial: When studying heterogeneous populations, we need to ensure that the subjects which we are studying are representative of the population. Or in other words, we need to avoid that our selection process affects and confounds the results. One robust possibility is to use randomization and select subjects randomly. Thus again, to be able to come to causal conclusions, we must set up our experiment appropriately with a good understanding of the causal question, and then can make an analysis taking advantage of the setup to get answers.

Observational study: When it is not possible to execute experiments that ensure causal interpretation of the results, we need to use our zoo of models or rather a selected model from our zoo to analyze the data that we obtain.

The doctor: The doctor needs to identify the best treatment for a given patient. This will be based on results from randomized clinical trials but needs to take into account that patient are different and that different alternative treatment options exist. Here, the doctor needs to use his zoo of models to evaluate the different options and come up with a recommendation. This example is to say that even, if it was possible to separate the experiment from the causal model using randomization, results must be integrated back into models to come up with decisions.

The black swan: The example of the “The doctor” may suggest that trying to avoid using causal models for the analysis of data is a lost effort, since in many situations we will have to use the models anyway. The counterexample is our intrinsic urge to maintain our zoo of models intact, independent of what happens. This is nicely described by (Taleb, 2007) and constitute an important argument to execute careful experiments and analyses whenever possible.

Coming back when doing experiments and analyzing the data obtained, we inevitably match what we find with our zoo of models. Usually, it is good practice to separate out the model from the experiment and its analysis, i.e., to use the model zoo only to give the question and to design the experiment, in a way that the analysis can be done without model and the result can be used to feed the zoo. In some cases, questions can only be answered by using some (causal) model as part of the analysis. In any case, when it comes to the point of making decisions, we have to use our zoo of models.

References

Harari, Y.N., 2015. Sapiens: A Brief History of Humankind. Harper Collins Publishers, New York, NY.

McElreath, R., 2020. Statistical Rethinking, Statistical Rethinking. https://doi.org/10.1201/9780429029608

Pearl, J., Mackenzie, D., 2018. The Book of Why.

Taleb, N.N., 2007. The Black Swan: The Impact of the Highly Improbable (Random House, 2007).

Data Analysis

About Me

Monday, August 2, 2021

Causal salad or a zoo of causal models – an essay

References

No comments:

Post a Comment