# Hypothesis testing¶

```
In [1]:
```

```
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
import scipy.stats
food = pd.read_pickle("../data/processed/food")
```

So far we’ve loaded our data set, described it with measures of central tendency and variability, and tested to see if our sample mean adequately describes our population mean. Now we move on to testing hypotheses.

A hypothesis is a statement that we make to explain a phenomenon that we do not yet know the answer to. A hypothesis must be testable.

For example, in our data set we have two nominal variables that we might propose there is a relationship between:

- NS-SEC class of the household reference person (
`A094r`

) - Tenure (
`A121r`

)

NS-SEC stands for ‘National Statistics Socio–economic classification’, and is a measure of employment grade, for example if the household reference person is a higher manager or professional, or a manual worker.

The household reference person is the person in the household (usually a family) who is full–time employed or, if both partners are full–time employed, the one who is oldest. This concept is used because families share social, cultural, and economic characteristics so, for example, if one partner is currently unemployed they share some of their characteristics with the HRP (for example they are likely to still live in the family home and participate in similar activities). Similarly, children who do not yet work can be ascribed economic and social characteristics based on their parent or carer’s economic activity.

Tenure simply means if the respondent owns their home (outright, or with a mortgage) or rents their home from a private landlord or council.

With all this in mind, our hypothesis might be:

There is an association or link between household reference person NS-SEC and home ownership (tenure).

For example, people in NS-SEC category 1 (higher managerial, administrative, and professional occupations) may be more likely to own their home compared to people in routine and manual occupations.

First of all, let’s look at a crosstabulation (crosstab) of frequencies comparing these two variables:

```
In [2]:
```

```
pd.crosstab(index = food.A094r, columns = food.A121r, margins = True, margins_name = "Total")
```

```
Out[2]:
```

A121r | 1 | 2 | 3 | Total |
---|---|---|---|---|

A094r | ||||

1 | 65 | 181 | 763 | 1009 |

2 | 81 | 129 | 404 | 614 |

3 | 282 | 222 | 476 | 980 |

4 | 75 | 82 | 34 | 191 |

5 | 361 | 101 | 1100 | 1562 |

Total | 864 | 715 | 2777 | 4356 |

From this crosstab we can see that similar numbers of people rent their homes from local authorities (‘public rented’) and private landlords (864 and 715 respectively), but that more than four times as many people own their own home (2777) than rent privately or rent publicly.

We can also see that most people in NS-SEC group 1 (higher managerial, administrative, and professional occupations) own their home (763) compared to rent (246 (65 + 181)). If we compare this to NS-SEC group 3 (routine and manual occupations) only 476 own their home, while 282 rent from a local authority (much more than the group 1) and 222 rent privately (similar to group 1).

These descriptions are pretty straightforward, but any analysis is complicated by the fact that the two groups are different sizes (n = 1,009 group 1; n = 614 group 2; n = 980 group 3) so we cannot directly compare the counts in this table to see if there are differences between the groups. The next step is to look at the percentages:

```
In [3]:
```

```
table = pd.crosstab(
index = food.A094r, columns = food.A121r,
normalize = "index"
) * 100 # converts proportions to percentages
table.round(decimals = 2)
```

```
Out[3]:
```

A121r | 1 | 2 | 3 |
---|---|---|---|

A094r | |||

1 | 6.44 | 17.94 | 75.62 |

2 | 13.19 | 21.01 | 65.80 |

3 | 28.78 | 22.65 | 48.57 |

4 | 39.27 | 42.93 | 17.80 |

5 | 23.11 | 6.47 | 70.42 |

Using the row percentages (i.e. each row adds up to 100%) we can see that approximately 75% of higher managerial and professional families own their own home, but only 49% of routine and manual families own their own homes. Similarly we can see that only about 6% of managerial and professional families rent from a local authority, but 29% of routine and manual families do (remember rows 4 and 5 are unemployed and unclassified respectively).

So we think there’s an association between NS-SEC of the household
reference person and tenure, and the crosstabs certainly seem to support
this. Unfortunately humans are very, very good at spotting patterns,
even when there isn’t one there, so instead of just relying on our
say–so we can statistically test to see if there really is a difference.
To do this we use a hypothesis test. Before we carry out a hypothesis
test we should specify it explicitly, and state a *null hypothesis*.

## Null hypothesis¶

To perform most hypothesis tests we specify a *null hypothesis*, which
we denote \(H_0\). A null hypothesis is a way of framing our
hypothesis that (usually) states there is *no* association between our
variables, so in our case we specify our null hypothesis as:

There isnoassociation between NS-SEC of the household reference person and housing tenure

The opposite of the null hypothesis is the *alternative hypothesis*,
\(H_1\), which is usually our original hypothesis.

It is important to frame a hypothesis test in this way because we assume
the absence of an association, and it is up to us as researchers to
provide evidence that there is one. For example, we cannot assume that
people who drink coffee are more intelligent than people who drink tea.
It is up to us to demonstrate that this is the case. This is what makes
our hypothesis *testable* and *falsifiable*.

It’s a bit like the presumption of innocence: we cannot be locked up unless we are proven to be guilty of a crime. If it were the other way around (i.e. presumption of guilt) we would all be incarcerated and we would have to prove that we were innocent, not just of one crime, but of every conveivable crime in order to be released! This would be an impossible task (not least because no doubt someone would add another charge arbitrarily).

Hermione Granger sums this up better than most statistics textbooks ever did:

“Well, how can that`[the resurrection stone]`

be real?”“Prove that it is not”, said Xenophilius.Hermione looked outraged.“But that’s — I’m sorry, but that’s completely ridiculous! How can I possibly prove it doesn’t exist? Do you expect me to get hold of — of all the pebbles in the world, and test them? I mean, you could claim that anything’s real if the only basis for believing in it is that nobody’sprovedit doesn’t exist!- Harry Potter and the Deathly Hallows

Our example uses two nominal variables, so the most appropriate hypothesis test in this case is the :math:chi ^ 2` test of association <09-chi-squared-test.html>`__. For variables of other levels of measurement we would use different hypothesis tests which we’ll get to.

## Errors interpreting the results¶

There are two problems to be aware of when interpreting the results of a
hypothesis test. The hypothesis test does not *prove* an association
between our variables; it gives us a statistical level of confidence
that there is an association.

There is always a risk that we might reject the null hypothesis (i.e. state that there is an association) when there isn’t one.

If there was really no association but we stated there is one this would
be called a **Type I (one) error**, sometimes known as a false positive.

The other type of error we could make is failing to reject the null
hypothesis when we should (i.e. we state there is no association when
there *is* one), also known as a false negative. This is known as a
**Type II (two) error**.

When we perform a hypothesis test we therefore need to balance the risk of stating that there is an association when there isn’t, and the risk of stating that there is no association when there is one.

Depending on our research question, one or the other errors might be more problematic. For example, if we are testing a new drug we want to make sure it is effective, so we do not want to make a type I error (i.e. state that there is an association when there isn’t one). But if we’re testing the drug for side effects we want to make sure we don’t make a type II error (i.e. assert that there are no side effects when, in fact, there are).

## Directional tests¶

One final consideration is the direction of our test. We can specify an alternative hypothesis that:

- simply states that there is a relationship between two variables;
- states that one variable is higher than the other, or
- states that one variable is lower than the other.

If we simply state that there is an association we should use a two–tailed test, and this is what we should do by default unless we have a very good (and documented!) reason for using a directional (one–tailed) test.

For example, we previously tested if there is an association between employment grade (NS–SEC) and housing tenure. We did not specify a direction, so we would usually use a two–tailed test. However, if we specified our alternative hypothesis with a direction, we would run a one–tailed test. For example, if our alternative hypothesis were:

People of higher employment grade aremore likelyto own their own home

we now have a directional test (i.e. we don’t think they’re less likely to own their own home). In this case we can use a one–tailed test.

Note that we have stated our hypothesis **before** we ran the test; you
cannot run a one–tailed test after the fact and claim you’ve found a
directional association. Also note that if you do not find a directional
association and later want to switch direction you cannot. Therefore
one–tailed tests tend to be used when previous literature identifies a
directional association and you want to use new data to test it.

The reason for this skepticism of one–tailed tests is that they require a smaller difference between the two variables to be statistically significant. If you are performing a non–directional (two–tailed test) at a confidence level of 0.05, your have half of this at each end of your distribution (0.025) to work with to detect a difference. If you specify a directional (i.e. one–tailed test) you have more of the distribution to work with to detect a significant difference.