Data sources

These tutorials use a number of teaching data sets available from the UK Data Service and Nomisweb (FYI Nomisweb have a great API for reproducible research) under terms of the Open Government License:

Office for National Statistics, University of Manchester, Cathie Marsh Institute for Social Research (CMIST), UK Data Service, 2016, Living Costs and Food Survey, 2013: Unrestricted Access Teaching Dataset, [data collection], Office for National Statistics, 2nd Edition, Office for National Statistics, [original data producer(s)]. Accessed 1 October 2018. SN: 7932, http://doi.org/10.5255/UKDA-SN-7932-2. Contains public sector information licensed under the Open Government Licence v2.0
Office for National Statistics, 2014, 2011 Census. Accessed October 2018. Contains public sector information licensed under the Open Government Licence v2.0.

If you’re following along at home I download and process the data sets with scripts in the src/ directory.

In [1]:
%run ../src/01-download.py

The food dataset (Living costs and food survey) includes income, but this is top–coded. For the purpose of these exercises I simply remove the top–coded cases to make the distributions a bit more normal. If you were analysing this data for real you would need to consider how to handle these top–coded cases.

In [2]:
%run ../src/02-process.py