# Comparing two independent groups¶

```
In [1]:
```

```
import os
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
import scipy.stats
age_shf = pd.read_pickle("../data/processed/age_shf")
age_ebn = pd.read_pickle("../data/processed/age_ebn")
```

We’ve looked at correlation, which quantifies how much one variable is related to another variable of the same individual, for example height and weight, or income and expenditure on food. When it’s necessary to compare different groups, for example the height of men and height of women, we use different tests:

- the parametric \(t\)–test, and
- the non–parametric Mann–Whitney U test

These tests work by comparing the means to see if they are statistically significantly different. Obviously the means of different samples will be different, so how much they differ before we conclude they are statistically different depends on the variance of the groups.

## \(t\)–test¶

The independent samples \(t\)–test is used when comparing two
independent groups, such as the heights of a group of males and the
heights of a group of females. We have a sample of males and measure
their heights, and a sample of females and measure their heights.
Crucially the height of one respondent does not affect the height of
another respondent, so the measurements are *independent*. The level of
measurement of both groups should be continuous (numerical interval or
ratio) and we are comparing the *same variable* (height).

For our example we have ages of two groups: people from Sheffield and people from Eastbourne. We saw in an earlier section on standard errors and confidence intervals that their means were different, and the confidence intervals suggested the population means were different also:

```
In [2]:
```

```
age_shf.C_AGE_NAME.mean()
```

```
Out[2]:
```

```
37.872098324944183
```

```
In [3]:
```

```
age_ebn.C_AGE_NAME.mean()
```

```
Out[3]:
```

```
42.833169033919447
```

The measurements are independent (i.e. the age of people in Sheffield does not affect the age of people in Eastbourne) and continuous (numerical) so we can use an independent samples \(t\)–test to statistically test if the means are different:

```
In [4]:
```

```
scipy.stats.ttest_ind(
age_shf.C_AGE_NAME, age_ebn.C_AGE_NAME, equal_var = False
)
```

```
Out[4]:
```

```
Ttest_indResult(statistic=-58.843596901278822, pvalue=0.0)
```

The \(p\) value is \(<< 0.01\) so we can reject the null
hypothesis that there is no difference between the ages of people in
Sheffield and the ages of people in Eastbourne (i.e. there *is* a
difference in mean age). In this case this is not surprising because we
actually have two populations, but this is intended to be illustrative
only.

### Assumptions¶

As always, there are assumptions. We have met the assumption of independent observations and the variable (age) is continuous, so these assumptions were met. The \(t\)–test also assumes that if the sample sizes are small they should be normally distributed, but this is not relevant in our case. If you need to test for this, plot a QQ plot:

```
In [5]:
```

```
scipy.stats.probplot(age_ebn.C_AGE_NAME, plot = plt)
```

```
Out[5]:
```

```
((array([-4.3447271 , -4.14589489, -4.03774648, ..., 4.03774648,
4.14589489, 4.3447271 ]),
array([ 0, 0, 0, ..., 100, 100, 100])),
(24.361214673692874, 42.833169033919454, 0.98548394512715509))
```

```
In [6]:
```

```
scipy.stats.probplot(age_shf.C_AGE_NAME, plot = plt)
```

```
Out[6]:
```

```
((array([-4.70745961, -4.52329906, -4.423601 , ..., 4.423601 ,
4.52329906, 4.70745961]),
array([ 0, 0, 0, ..., 100, 100, 100])),
(22.700384057589815, 37.872098324944176, 0.98466093261250653))
```

These are not normally distributed (if they were the blue points would lie along the red line), but this doesn’t matter in this case because we do not have small samples. We also need to check that the variances of the two groups are approximately equal using a Levene’s test:

```
In [7]:
```

```
scipy.stats.levene(
age_shf.C_AGE_NAME, age_ebn.C_AGE_NAME
)
```

```
Out[7]:
```

```
LeveneResult(statistic=1358.0920464422002, pvalue=5.4491601311285835e-297)
```

Ah. The \(p\) value is highly statistically significant, so we
should reject the null that the variances are not different (i.e. there
are). In this case we should use values adjusted to correct for unequal
variances (this is actually the default behaviour for
`scipy.stats.ttest_ind()`

):

```
In [8]:
```

```
scipy.stats.ttest_ind(
age_shf.C_AGE_NAME, age_ebn.C_AGE_NAME, equal_var = False
)
```

```
Out[8]:
```

```
Ttest_indResult(statistic=-58.843596901278822, pvalue=0.0)
```

If the variances were approximately equal the \(p\) value would be
\(> 0.05\) and we would not reject the null. If this were the case
we could perform an equal variances \(t\)–test, by specifying
`equal_var = True`

.

## Mann–Whitney U¶

The Mann–Whitney U test is a non–parametric version of the \(t\)–test useful when we do not have continuous data (but the variable should be at least ordinal) or we otherwise violate one or more of the assumptions of the \(t\)–test. It stil assumes that the measurements are independent.

Mann–Whitney U works by ranking the observations without considering the
groups they’re from. If there was no difference between groups the sum
of the ranks would be approximately equal. If there *is* a difference
between groups the sum of the ranks for each group will differ. Using
our example age data:

```
In [9]:
```

```
scipy.stats.mannwhitneyu(
age_shf.C_AGE_NAME, age_ebn.C_AGE_NAME
)
```

```
Out[9]:
```

```
MannwhitneyuResult(statistic=24316208190.5, pvalue=0.0)
```

And reassuringly we can see the statistic is statistically significant. We wouldn’t really use a Mann–Whitney U test for age (continuous) data; this is just to demonstrate the syntax and interpretation. Mann–Whitney U tests are useful in the social sciences though because we deal with ordinal data (e.g. Likert scales) a lot.

## Dependent groups¶

The \(t\)–test and Mann–Whitney U test assume that the observations are independent. In our example the two groups are independent because the age of people in Sheffield does not affect the age of people in Eastbourne.

There are situations when the observations are *dependent* and these
tests are not appropriate for this type of data. In the social sciences,
this is common with longitudinal data (data where the same individuals
are asked a survey at different points in time). Such *repeated
measures* data is not independent.

In these cases we can test for differences in means, but I’m not going to go into how to do this here; there are plenty of resources to help. I’m bringing this up here so that you are mindful of the test(s) you use if you have repeated measures or longitudinal data.