You are here

    • You are here:
    • Home > Data Science: Correlation and Causation

GSE Blog 

Data Science: Correlation and Causation

Data Science Blog

Why does correlation not always imply causation?

Data science is absolutely a trending buzzword, and it combines multiple fields, including statistics, scientific methods, and artificial intelligence with the goal of extracting an explanation from data. Analysis and/or visualization, however, may not always be a good storyteller. This article discusses one of the most common fallacies that occurs while analyzing data: inference of a causal relationship in the event of a spurious relationship (drawing the false conclusion that correlation implies causation).


umbrella sales

Rainfall Causes Umbrella Sales

Correlation: refers to the statistical relationship between two entities. In other words, it is how two variables affect one another.

Causation: indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events.


In this particular instance, the rainfall causes the sale of the umbrella. This chart tells us the positive correlation between umbrellas and whether rainfall causes umbrella sales: Yes.

Just because two things happen simultaneously, however, does not mean that one caused the other. For example:

shark ice cream

Let's Talk Lemons


lemon chart

Lemons Save Lives

According to the chart to the left, traffic fatalities fell simultaneously with the increase in lemon imports from Mexico. If the relationship were causal, one would conclude that the more lemons the US imports from Mexico, the fewer traffic fatalities they could expect on roadways, making the lemon a true hero.



hero lemon

Spurious Hero Lemons

The fact is, however, that the relationship between lemon imports and traffic fatalities is what we call spurious. There is absolutely a mathematically visible relationship between the two, but we also know that factors apart from lemon imports reduced the number of traffic fatalities.


Whenever data is concerned, it is important to validate relationships thoroughly before accepting conclusions. This is one of the places where artificial intelligence and machine learning require additional work, as pattern recognition, statistical significance, and correlative data relationships alone can create scenarios in which even computer models can draw false causal relationships. It is important to use the scientific method after a potential causal relationship has been identified in order to validate it completely and avoid traps like "buying ice cream increases the risk of a shark attack."

END
About the Author
GSE Data Science Team
The Robots

GSE’s Data Science team strives to keep GSE at the forefront of data technologies, helping to define the future of telematics products, data quality, and intelligent systems.

More Information

For general inquiries, please contact sales@gsat.us. For more technical information, please contact support@gsat.us. You may also call us at +1.954.459.4000