Hypothesis testing

[1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
import scipy.stats

food = pd.read_pickle("../data/processed/food")

So far we’ve loaded our data set, described it with measures of central tendency and variability, and tested to see if our sample mean adequately describes our population mean. Now we move on to testing hypotheses.

A hypothesis is a statement that we make to explain a phenomenon that we do not yet know the answer to. A hypothesis must be testable.

For example, in our data set we have two nominal variables that we might propose there is a relationship between:

  • NS-SEC class of the household reference person (A094r)
  • Tenure (A121r)

NS-SEC stands for ‘National Statistics Socio–economic classification’, and is a measure of employment grade, for example if the household reference person is a higher manager or professional, or a manual worker.

The household reference person is the person in the household (usually a family) who is full–time employed or, if both partners are full–time employed, the one who is oldest. This concept is used because families share social, cultural, and economic characteristics so, for example, if one partner is currently unemployed they share some of their characteristics with the HRP (for example they are likely to still live in the family home and participate in similar activities). Similarly, children who do not yet work can be ascribed economic and social characteristics based on their parent or carer’s economic activity.

Tenure simply means if the respondent owns their home (outright, or with a mortgage) or rents their home from a private landlord or council.

With all this in mind, our hypothesis might be:

There is an association or link between household reference person NS-SEC and home ownership (tenure).

For example, people in NS-SEC category 1 (higher managerial, administrative, and professional occupations) may be more likely to own their home compared to people in routine and manual occupations.

First of all, let’s look at a crosstabulation (crosstab) of frequencies comparing these two variables:

[2]:
pd.crosstab(index = food.A094r, columns = food.A121r, margins = True, margins_name = "Total")
[2]:
A121r 1 2 3 Total
A094r
1 65 181 763 1009
2 81 129 404 614
3 282 222 476 980
4 75 82 34 191
5 361 101 1100 1562
Total 864 715 2777 4356

From this crosstab we can see that similar numbers of people rent their homes from local authorities (‘public rented’) and private landlords (864 and 715 respectively), but that more than four times as many people own their own home (2777) than rent privately or rent publicly.

We can also see that most people in NS-SEC group 1 (higher managerial, administrative, and professional occupations) own their home (763) compared to rent (246 (65 + 181)). If we compare this to NS-SEC group 3 (routine and manual occupations) only 476 own their home, while 282 rent from a local authority (much more than the group 1) and 222 rent privately (similar to group 1).

These descriptions are pretty straightforward, but any analysis is complicated by the fact that the two groups are different sizes (n = 1,009 group 1; n = 614 group 2; n = 980 group 3) so we cannot directly compare the counts in this table to see if there are differences between the groups. The next step is to look at the percentages:

[3]:
table = pd.crosstab(
    index = food.A094r, columns = food.A121r,
    normalize = "index"
) * 100  # converts proportions to percentages

table.round(decimals = 2)
[3]:
A121r 1 2 3
A094r
1 6.44 17.94 75.62
2 13.19 21.01 65.80
3 28.78 22.65 48.57
4 39.27 42.93 17.80
5 23.11 6.47 70.42

Using the row percentages (i.e. each row adds up to 100%) we can see that approximately 75% of higher managerial and professional families own their own home, but only 49% of routine and manual families own their own homes. Similarly we can see that only about 6% of managerial and professional families rent from a local authority, but 29% of routine and manual families do (remember rows 4 and 5 are unemployed and unclassified respectively).

So we think there’s an association between NS-SEC of the household reference person and tenure, and the crosstabs certainly seem to support this. Unfortunately humans are very, very good at spotting patterns, even when there isn’t one there, so instead of just relying on our say–so we can statistically test to see if there really is a difference. To do this we use a hypothesis test. Before we carry out a hypothesis test we should specify it explicitly, and state a null hypothesis.

Null hypothesis

To perform most hypothesis tests we specify a null hypothesis, which we denote \(H_0\). A null hypothesis is a way of framing our hypothesis that (usually) states there is no association between our variables, so in our case we specify our null hypothesis as:

There is no association between NS-SEC of the household reference person and housing tenure

The opposite of the null hypothesis is the alternative hypothesis, \(H_1\), which is usually our original hypothesis.

It is important to frame a hypothesis test in this way because we assume the absence of an association, and it is up to us as researchers to provide evidence that there is one. For example, we cannot assume that people who drink coffee are more intelligent than people who drink tea. It is up to us to demonstrate that this is the case. This is what makes our hypothesis testable and falsifiable.

It’s a bit like the presumption of innocence: we cannot be locked up unless we are proven to be guilty of a crime. If it were the other way around (i.e. presumption of guilt) we would all be incarcerated and we would have to prove that we were innocent, not just of one crime, but of every conveivable crime in order to be released! This would be an impossible task (not least because no doubt someone would add another charge arbitrarily).

Hermione Granger sums this up better than most statistics textbooks ever did:

“Well, how can that [the resurrection stone] be real?”
“Prove that it is not”, said Xenophilius.
Hermione looked outraged.
“But that’s — I’m sorry, but that’s completely ridiculous! How can I possibly prove it doesn’t exist? Do you expect me to get hold of — of all the pebbles in the world, and test them? I mean, you could claim that anything’s real if the only basis for believing in it is that nobody’s proved it doesn’t exist!
- Harry Potter and the Deathly Hallows

Our example uses two nominal variables, so the most appropriate hypothesis test in this case is the :math:chi ^ 2` test of association <09-chi-squared-test.html>`__. For variables of other levels of measurement we would use different hypothesis tests which we’ll get to.

Errors interpreting the results

There are two problems to be aware of when interpreting the results of a hypothesis test. The hypothesis test does not prove an association between our variables; it gives us a statistical level of confidence that there is an association.

There is always a risk that we might reject the null hypothesis (i.e. state that there is an association) when there isn’t one.

If there was really no association but we stated there is one this would be called a Type I (one) error, sometimes known as a false positive.

The other type of error we could make is failing to reject the null hypothesis when we should (i.e. we state there is no association when there is one), also known as a false negative. This is known as a Type II (two) error.

When we perform a hypothesis test we therefore need to balance the risk of stating that there is an association when there isn’t, and the risk of stating that there is no association when there is one.

Depending on our research question, one or the other errors might be more problematic. For example, if we are testing a new drug we want to make sure it is effective, so we do not want to make a type I error (i.e. state that there is an association when there isn’t one). But if we’re testing the drug for side effects we want to make sure we don’t make a type II error (i.e. assert that there are no side effects when, in fact, there are).

Directional tests

One final consideration is the direction of our test. We can specify an alternative hypothesis that:

  • simply states that there is a relationship between two variables;
  • states that one variable is higher than the other, or
  • states that one variable is lower than the other.

If we simply state that there is an association we should use a two–tailed test, and this is what we should do by default unless we have a very good (and documented!) reason for using a directional (one–tailed) test.

For example, we previously tested if there is an association between employment grade (NS–SEC) and housing tenure. We did not specify a direction, so we would usually use a two–tailed test. However, if we specified our alternative hypothesis with a direction, we would run a one–tailed test. For example, if our alternative hypothesis were:

People of higher employment grade are more likely to own their own home

we now have a directional test (i.e. we don’t think they’re less likely to own their own home). In this case we can use a one–tailed test.

Note that we have stated our hypothesis before we ran the test; you cannot run a one–tailed test after the fact and claim you’ve found a directional association. Also note that if you do not find a directional association and later want to switch direction you cannot. Therefore one–tailed tests tend to be used when previous literature identifies a directional association and you want to use new data to test it.

The reason for this skepticism of one–tailed tests is that they require a smaller difference between the two variables to be statistically significant. If you are performing a non–directional (two–tailed test) at a confidence level of 0.05, your have half of this at each end of your distribution (0.025) to work with to detect a difference. If you specify a directional (i.e. one–tailed test) you have more of the distribution to work with to detect a significant difference.