Introduction

As social scientists and researchers we want to answer questions about the social world. At its most simplistic (and I completely acknowledge that I’m massively generalising here), we usually seek to describe the world or quantify the world. For these tasks we use qualitative methods and quantitative methods respectively.

To quantify the world we might want to know things like, “how many people have been a victim of crime?”, “how many people have mental health problems?”, or “how many people have social science degrees?”.

Inferring from a sample

Sometimes we have complete (or near–complete) data about everybody in a population. For example, in the UK the decennial (i.e. every ten years) census is a count of everybody, including certain characteristics such as health, education, and employment. In this case the population is everybody in the UK.

More often than not it is impractical to ask everybody our questions. Usually the cost and time required to carry out such a survey prohibit asking every single person what we want to know, but also very few people would answer everything that they were asked (the only reason everybody responds to the census is because it’s a crime not to complete the census and people have been prosecuted for not completing it). Instead we take a sample of the population, and infer, from our sample, what the population is like.

For example, we might ask a random sample of 1,000 people what their favourite hot drink is. It’s not that we want to know what this 1,000 people think more than any other 1,000 people. Instead, they are our sample and, based on their responses, we can infer what the most popular hot drink is for the population. In the UK, the most popular hot drink is tea, but everybody knows it should be coffee.

Coffee is clearly superior to tea

Coffee is clearly superior to tea

Obtaining knowledge about a population by inferring from a sample is the cornerstone of quantitative social science research, and uses many statistical techniques to be able to do this. The good news is the most difficult challenge is often deciding which technique to use; using the technique is often like following a recipe.

Description and inference

We usually want to do two things with out sample to understand our population:

  • Describe the sample (making the assumption that we are also describing the population)
  • Make predictions about the population from the sample

We usually call these descriptive statistics and inferential statistics respectively. This can be a bit confusing, because even descriptive statistics infers about the population from the sample.

We often describe things like the ‘average’ and how much the data varies from this average. Inferential statistics, despite the confusing term, usually allow us:

  • to test if differences in the sample are likely to be actual differences in the population (hypothesis testing)
  • to make predictions, for example by knowing if a person is male or female we can use this information to predict how likely they are to commit a crime