# A causal relationship in statistics

### Statistical Language - Correlation and Causation

There is no absolute proof of anything in statistics, but you can often argue that it is very likely that there is an approximate relationship between. A causal relation between two events exists if the occurrence of the first causes the other. The first event is called the cause and the second event is called the. IMPORTANT: NO regression technique, NO statistical analysis at all can test a causal relationship. Causality is no property contained in the data. The ONLY way.

Third factor C the common-causal variable causes both A and B[ edit ] Main article: Spurious relationship The third-cause fallacy also known as ignoring a common cause [6] or questionable cause [6] is a logical fallacy where a spurious relationship is confused for causation.

### Correlation does not imply causation - Wikipedia

It is a variation on the post hoc ergo propter hoc fallacy and a member of the questionable cause group of fallacies. All of these examples deal with a lurking variablewhich is simply a hidden third variable that affects both causes of the correlation. Example 1 Sleeping with one's shoes on is strongly correlated with waking up with a headache. Therefore, sleeping with one's shoes on causes headache.

## Australian Bureau of Statistics

The above example commits the correlation-implies-causation fallacy, as it prematurely concludes that sleeping with one's shoes on causes headache. A more plausible explanation is that both are caused by a third factor, in this case going to bed drunkwhich thereby gives rise to a correlation.

So the conclusion is false. Example 2 Young children who sleep with the light on are much more likely to develop myopia in later life.

How Ice Cream Kills! Correlation vs. Causation

Therefore, sleeping with the light on causes myopia. This is a scientific example that resulted from a study at the University of Pennsylvania Medical Center. Published in the May 13, issue of Nature[7] the study received much coverage at the time in the popular press. It did find a strong link between parental myopia and the development of child myopia, also noting that myopic parents were more likely to leave a light on in their children's bedroom.

## causal relationship

Example 3 As ice cream sales increase, the rate of drowning deaths increases sharply. Therefore, ice cream consumption causes drowning. This example fails to recognize the importance of time of year and temperature to ice cream sales.

Ice cream is sold during the hot summer months at a much greater rate than during colder times, and it is during these hot summer months that people are more likely to engage in activities involving water, such as swimming. The increased drowning deaths are simply caused by more exposure to water-based activities, not ice cream. The stated conclusion is false.

This suggests a possible "third variable" problem, however, when three such closely related measures are found, it further suggests that each may have bidirectional tendencies see " bidirectional variable ", abovebeing a cluster of correlated values each influencing one another to some extent.

Therefore, the simple conclusion above may be false. Example 5 Since the s, both the atmospheric CO2 level and obesity levels have increased sharply.

Hence, atmospheric CO2 causes obesity.

• Correlation does not imply causation

Richer populations tend to eat more food and produce more CO2. Example 6 HDL "good" cholesterol is negatively correlated with incidence of heart attack. Therefore, taking medication to raise HDL decreases the chance of having a heart attack. Further research [14] has called this conclusion into question.

Instead, it may be that other underlying factors, like genes, diet and exercise, affect both HDL levels and the likelihood of having a heart attack; it is possible that medicines may affect the directly measurable factor, HDL levels, without affecting the chance of heart attack.

A causes B, and B causes A[ edit ] Causality is not necessarily one-way; in a predator-prey relationshippredator numbers affect prey numbers, but prey numbers, i. Another well-known example is that cyclists have a lower Body Mass Index than people who do not cycle. This is often explained by assuming that cycling increases physical activity levels and therefore decreases BMI.

Because results from prospective studies on people who increase their bicycle use show a smaller effect on BMI than cross-sectional studies, there may be some reverse causality as well i.

The more things are examined, the more likely it is that two unrelated variables will appear to be related. The result of the last home game by the Washington Redskins prior to the presidential election predicted the outcome of every presidential election from to inclusivedespite the fact that the outcomes of football games had nothing to do with the outcome of the popular election. The relationship is therefore causal.

A bank manager is concerned with the number of customers whose accounts are overdrawn. Half of the accounts that become overdrawn in one week are randomly selected and the manager telephones the customer to offer advice. Any difference between the mean account balances after two months of the overdrawn accounts that did and did not receive advice can be causally attributed to the phone calls.

If two variables are causally related, it is possible to conclude that changes to the explanatory variable, X, will have a direct impact on Y. Non-causal relationships Not all relationships are causal. In non-causal relationships, the relationship that is evident between the two variables is not completely the result of one variable directly affecting the other.

In the most extreme case, Two variables can be related to each other without either variable directly affecting the values of the other. The two diagrams below illustrate mechanisms that result in non-causal relationships between X and Y. If two variables are not causally related, it is impossible to tell whether changes to one variable, X, will result in changes to the other variable, Y. For example, the scatterplot below shows data from a sample of towns in a region.