Levels of measurement¶
Levels of measurement describe the nature of your data point. They are important because they determine what type of descriptive statistics are appropriate to report and what type of analysis (statistical tests or predictions) you can perform with them. Before performing any analysis think about the level of measurement and what is appropriate.
The four levels of measurement are:
Also known as categorical. These are labels such as male/female; no religion/religion. They have no inherent order; one response is not ‘better’ or ‘higher’ than another. As social scientists you’ll find these are pretty common.
Your analysis options are most limited with nomial data, and the tools available to you are typically:
- frequency (counts) tables (sometimes called crosstabs),
- bar charts for displaying frequencies by category, or
- \(\chi ^ 2\) tests for testing association.
Similar to nominal in that they are labels (rather than numbers), but the labels have a rank or order. For example, a ‘strongly disagree’ to ‘strongly agree’ scale (sometimes called a Likert scale) is ordinal. Another example would be ‘guilty’ and ‘not guilty’.
You have a few more analysis options with ordinal data, including:
- correlation (Spearman’s \(\rho\) and Kendall’s \(\tau\))
- non–parametric tests to determine if two samples differ from each other (Mann–Whitney U)
- we also have more options when plotting ordinal data
Interval data is numerical but does not have a meaningful zero value. The most common example often given is temperature expressed in degrees celcius. A temperature of 20°C is not twice as hot as 10°C, because 0°C is arbitrary (based on the freezing point of water) rather than based on the absence of heat.
A more common example in the social sciences is date, as years are based on an arbitrary zero (0AD). In the social sciences you don’t come across these very often.
Ratio data is again numerical, but differs from interval because it has a meaningful, non–arbitrary, zero. As we saw above dates are interval, but age is ratio because it has a meaningful zero (i.e. birth). It is correct to say that someone who is ten years old is twice as old as someone who is five years old.
Numerical (interval or ratio) data afford us the greatest range of analytical options.
It is more common for numerical data in the social sciences to be ratio than interval. Other examples include income and number of events (e.g. number of crimes in an area). There are two sub–types of numerical (interval or ratio) data:
Discrete and continuous data¶
One additional thing to bear in mind for interval and ratio data is that it can be discrete or continuous.
- Discrete means it can only take on certain (discrete) values, usually integers (1, 2, 3…)
- Continuous means it can take on any numerical value (1.23, 4.56, 7.897126534).
Discrete data is usually count data (i.e. frequencies things occur) which are quite common in the social sciences. It is not possible to own 1.6 cars or have 2.4 children, for example.
Continuous data can take on any value and is usually measured. Examples include height, weight, and age (although age is usually measured on a discrete scale; number of years).
Income is the most common ‘unusual’ one, in that it can be a decimal (i.e. pence) but is discrete because you can only have a limited range of values less than 1.0 (i.e. you cannot have 0.3 pence). It is, however, most commonly treated as continuous and in most models this works well.