Introduction¶

As social scientists and researchers we want to answer questions about the social world. At its most simplistic (and I completely acknowledge that I’m massively generalising here), we usually seek to describe the world or quantify the world. For these tasks we use qualitative methods and quantitative methods respectively.

To quantify the world we might want to know things like, “how many people have been a victim of crime?”, “how many people have mental health problems?”, or “how many people have social science degrees?”.

Inferring from a sample¶

Sometimes we have complete (or near–complete) data about everybody in a population. For example, in the UK the decennial (i.e. every ten years) census is a count of everybody, including certain characteristics such as health, education, and employment. In this case the population is everybody in the UK.

More often than not it is impractical to ask everybody our questions. Usually the cost and time required to carry out such a survey prohibit asking every single person what we want to know, but also very few people would answer everything that they were asked (the only reason everybody responds to the census is because it’s a crime not to complete the census and people have been prosecuted for not completing it). Instead we take a sample of the population, and infer, from our sample, what the population is like.

For example, we might ask a random sample of 1,000 people what their favourite hot drink is. It’s not that we want to know what this 1,000 people think more than any other 1,000 people. Instead, they are our sample and, based on their responses, we can infer what the most popular hot drink is for the population. In the UK, the most popular hot drink is tea, but everybody knows it should be coffee.

Coffee is clearly superior to tea

Obtaining knowledge about a population by inferring from a sample is the cornerstone of quantitative social science research, and uses many statistical techniques to be able to do this. The good news is the most difficult challenge is often deciding which technique to use; using the technique is often like following a recipe.

Types of samples¶

Not all samples are created equal. To be able to infer to a population we need to ensure our sample is representative. If our sample is not representative the patterns and behaviours we find in the sample might not be present in the population. As a very crude example, if we want to know how many years of formal education the population of the UK has on average we cannot simply sample individuals from a university, as these individuals are likely to have been in formal education for a greater number of years than average (especially so if you ask the staff, too).

Samples that are representative are random which means that every member of our population of interest is as likely as any other to be included in our sample. Samples that are not representative include convenience samples, snowball samples, or quota samples. In our example above, every individual in the UK would need to be as likely to be asked as any other, whereas clearly by sampling at a university this is not the case.

To ensure our sample is random we first decide what our population of interest is, based on our research question. If we want to know what the average number of years of formal education is for all individuals in the UK, then all individuals in the UK is our population of interest.

Our population doesn’t have to be this large, though. Our research question might be how much time, on average, do university students spend studying. In this case our population of interest is students at university.

The population of interest is also sometimes called a sampling frame. Essentially they both mean the individuals from which you draw your sample, and indicates that you have thought about what the population is that is appropriate for your research question.

Alan Bryman’s Social Research Methods book has a great overview of sampling theory and procedure, and I thoroughly recommend you get a copy if you want to read more about this. There are multiple editions but they don’t change much, so if you find an older edition second hand for less money, go for it.

Description and inference¶

We usually want to do two things with out sample to understand our population:

Describe the sample (making the assumption that we are also describing the population)
Make predictions about the population from the sample

We usually call these descriptive statistics and inferential statistics respectively. This can be a bit confusing, because even descriptive statistics infers about the population from the sample.

We often describe things like the ‘average’ and how much the data varies from this average. Inferential statistics, despite the confusing term, usually allow us:

to test if differences in the sample are likely to be actual differences in the population (hypothesis testing)
to make predictions, for example by knowing if a person is male or female we can use this information to predict how likely they are to commit a crime

Units of observation¶

Throughout this website I usually refer to individuals (people) as the object of study. As a social scientist studying individuals is most common for me, but the individuals you study do not necessarily need to be people.

Your unit of observation could be almost anything, as long as they are of the same type. For example, you could study many flowers, animals, organisations, or even vehicles. What’s important is that for any given data your unit of observation (your individuals) are the same type.