Interpretability in the world of machine learning could be defined as the ability to explain the reasons why a model, once it receives a given input or set of data, generates a given output or list of predictions. This approach is the opposite of the approach mostly followed in the past when implementing machine learning models, known as the "black box" model:
The objective of algorithms to extract interpretability is precisely to convert this black box into a set of explanations, mathematical rules or series of inferences that allow us to understand why a machine learning model makes certain decisions, output, based on specific stimuli, input, concrete.
Causality implies finding two phenomena or events that have a relationship of necessity of concurrence in a given order, also called cause and effect relationship. That is to say, it implies finding a factor Y that necessarily involves the existence of another factor X that causes it.
It is very common in the world of machine learning, even among professionals in the sector, to confuse the concept of causality with others such as interpretability or correlation, which are closely linked. To demonstrate the difference between the two concepts, we will use a very illustrative example.
Let's suppose that our objective is to create a machine learning model that predicts the volume of heat strokes that will occur tomorrow in Madrid. Let's also imagine that among the data we use as input for our model we do not have the temperature variable, but we do have a variable that quantifies the consumption of ice cream in Madrid every day. If we use interpretability algorithms, see 3 reasons why interpretability is gaining importance in the world of machine learning, on our imaginary model, it is very likely that the variable "ice cream consumption" appears as the first or one of the first in relevance when explaining why our model predicts more or less heat shocks for a given day. In view of these results, can we draw the conclusion that ice cream consumption is the cause of heat stroke in Madrid? Clearly not.
What happened then? What was the problem with our reasoning? The explanation is that we have not taken into account the existence of certain statistical phenomena such as spurious correlations, which are defined as:
Mathematical relationship in which two events have no logical connection, although it can be implied that they do due to a third factor not yet considered, called a confounding factor or hidden variable.
In this case, the confounding factor or hidden variable is temperature, not included in the model input, which is the real cause that influences the number of heat strokes, but at the same time causes an increase in ice cream consumption, which is what our model "sees".
In conclusion, it is important not to confuse interpretability with causality, especially in an area of great sensitivity and criticality in decision making such as the health sector, but even so, the interpretation of machine learning models can provide us with relevant clinical or management information, helping to generate new knowledge or detect possible lines of research.
At Horus ML we have developed our own explanatory algorithms, a mix of the above methods and internal developments, to generate interpretability for each of our ML models. For more information about our machine learning models or how these algorithms can help you gain interpretability on your predictions, please contact us.