11  Data sets

library(sf)
library(dplyr)

11.1 Greater Machester land use data

Availability

The dataset is stored on a gpkg file that can be found, within the structure of this project, under:

st_LSOA <- st_read("./data/geodemographics/manchester_land_cover_2011.gpkg")
Reading layer `manchester_land_cover_2011' from data source 
  `/Users/carmen/Documents/GitHub/r4ps/data/geodemographics/manchester_land_cover_2011.gpkg' 
  using driver `GPKG'
Simple feature collection with 1673 features and 44 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 351662.3 ymin: 381166 xmax: 406087.2 ymax: 421037.7
Projected CRS: OSGB36 / British National Grid

Variables

The variables included in this dataset follow the land use classification of the CORINE Land Cover dataset.

Source & Pre-processing

The data was sourced from What do ‘left behind’ areas look like over time? and cleaned on Python.

11.2 British administrative boundaries (LSOAs, MSOAs and LAs)

Availability

The dataset for the boundaries of the lower-layer super-output areas (LSOAs) within London is stored as a shapefile that can be found under:

st_LSOA <- st_read("data/geodemographics-old/LSOA_2011_London_gen_MHW/LSOA_2011_London_gen_MHW.shp")
Reading layer `LSOA_2011_London_gen_MHW' from data source 
  `/Users/carmen/Documents/GitHub/r4ps/data/geodemographics-old/LSOA_2011_London_gen_MHW/LSOA_2011_London_gen_MHW.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 4835 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 503574.2 ymin: 155850.8 xmax: 561956.7 ymax: 200933.6
Projected CRS: OSGB36 / British National Grid

Data for the shapes of the MSOAs must be downloaded from UK’s GeoPortal here. Make sure you download the 2021 version and store it in the .\data\machine-learning\ folder as a file with the .gpkg extension. We have not included the file in the GitHub repo due to its large size. You can load it with st_read and ensure it is in a projection system of choice.

The dataset for the boundaries of the local authority distrits (LADs) for the UK is stored as a shapefile that can be found under:

LA_UK <- st_read("./data/networks/Local_Authority_Districts_(December_2022)_Boundaries_UK_BFC/LAD_DEC_2022_UK_BFC.shp")
Reading layer `LAD_DEC_2022_UK_BFC' from data source 
  `/Users/carmen/Documents/GitHub/r4ps/data/networks/Local_Authority_Districts_(December_2022)_Boundaries_UK_BFC/LAD_DEC_2022_UK_BFC.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 374 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -116.1928 ymin: 5336.966 xmax: 655653.8 ymax: 1220302
Projected CRS: OSGB36 / British National Grid

Variables

For each of the 4,835 LSOAs, the following characteristics are available:

names(st_LSOA)
 [1] "LSOA11CD"  "LSOA11NM"  "MSOA11CD"  "MSOA11NM"  "LAD11CD"   "LAD11NM"  
 [7] "RGN11CD"   "RGN11NM"   "USUALRES"  "HHOLDRES"  "COMESTRES" "POPDEN"   
[13] "HHOLDS"    "AVHHOLDSZ" "geometry" 

where:

  • LSOA11CD: Lower-Layer Super-Output Area code
  • LSOA11NM: Lower-Layer Super-Output Area code
  • MSOA11CD: Medium-Layer Super-Output Area code
  • MSOA11NM: Medium-Layer Super-Output Area code
  • LAD11CD: Local Authority District code
  • LAD11NM: Local Authority District name
  • RGN11CD: Region code
  • RGN11NM: Region name
  • USUALRES: Usual residents
  • HHOLDRES: Household residents
  • COMESTRES: Communal Establishment residents
  • POPDEN: Population density
  • HHOLDS: Number of households
  • AVHHOLDSZ: Average household size
  • geometry: Polygon of LSOA

For each of the 374 LADs, the following characteristics are available:

names(LA_UK)
 [1] "OBJECTID"   "LAD22CD"    "LAD22NM"    "BNG_E"      "BNG_N"     
 [6] "LONG"       "LAT"        "GlobalID"   "SHAPE_Leng" "SHAPE_Area"
[11] "geometry"  

where:

  • OBJECTID: object identifier
  • LAD22CD: Local Authority District code
  • LAD22NM: Local Authority District name
  • BNG_E: Location Easting
  • BNG_N: Location Northing
  • LONG: Location Longitude
  • LAT: Location Latitude
  • GlobalID: Global Identifier
  • SHAPE_Leng: Boundary length
  • SHAPE_Area: Area within boundary
  • geometry: Polygon of LAD

Projection

The shapes of each LSOA are stored as polygons an expressed in the OSGB36 projection:

st_crs(st_LSOA)
Coordinate Reference System:
  User input: OSGB36 / British National Grid 
  wkt:
PROJCRS["OSGB36 / British National Grid",
    BASEGEOGCRS["OSGB36",
        DATUM["Ordnance Survey of Great Britain 1936",
            ELLIPSOID["Airy 1830",6377563.396,299.3249646,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6277]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",49,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-2,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.999601272,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",400000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",-100000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

Similarly, the shapes of each LAD are stored as polygons an expressed in the OSGB36 projection:

st_crs(LA_UK)
Coordinate Reference System:
  User input: OSGB36 / British National Grid 
  wkt:
PROJCRS["OSGB36 / British National Grid",
    BASEGEOGCRS["OSGB36",
        DATUM["Ordnance Survey of Great Britain 1936",
            ELLIPSOID["Airy 1830",6377563.396,299.3249646,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4277]],
    CONVERSION["British National Grid",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",49,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-2,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996012717,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",400000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",-100000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
        BBOX[49.75,-9,61.01,2.01]],
    ID["EPSG",27700]]

Source & Pre-processing

The boundaries for the LSOAs within London can be found directly from the London Datastore website.

The boundaries for the LADs for the UK can be found on the ONS Open Geography Portal website. To filter for the London LADs, i.e. the London boroughs, we run the following line of code:

LND_boroughs <- LA_UK %>% filter(grepl('E09', LAD22CD)) 

11.3 Twitter migration data for the UK

11.3.1 Availability

The dataset is stored on a gpkg file that can be found, within the structure of this project, under:

st_LSOA <- st_read("./data/networks/internal_migration_uk.csv")
Reading layer `internal_migration_uk' from data source 
  `/Users/carmen/Documents/GitHub/r4ps/data/networks/internal_migration_uk.csv' 
  using driver `CSV'
Warning: no simple feature geometries present: returning a data.frame or tbl_df

11.3.2 Source and preprocessing

The data was created for the paper (Wang et al. 2022). The paper includes details on the methodology.

11.4 Worldpop population count data for Ukraine

11.5 Census population count data for UK

11.6 Ukraine’s administrative boundaries

11.7 Twitter data on public opinion originated in the US and in the UK

11.8 Reddit data

11.9 Google mobility data for Italy and the UK

11.10 COVID-19 cases data for London and Rome

11.11 Census MSOA data for England and Wales

Availability

The dataset for the demographic census data of each MSOA in England and Wales can be loaded as a csv file from:

df_MSOA <- read.csv("./data/machine-learning/census2021-msoa.csv")

A dataset for the data on the median rent price for each MSOA can be loaded as a csv as below. This data is from Zoopla and is made available here for non-commercial use, through the Urban Big Data Centre:

df_rent <- read.csv("./data/machine-learning/zoopla_mean_rent_msoa.csv")

Variables

For each of the 7,080 MSOAs recorded in England and Wales, the following fields are available:

names(df_MSOA)
 [1] "X"                "date"             "geography"        "geography.code"  
 [5] "inHH"             "inCE"             "SING"             "MARRIED"         
 [9] "SEP"              "DIV"              "WIDOW"            "UK"              
[13] "EU"               "AFR"              "AS"               "AM"              
[17] "OC"               "BO"               "DENSITY"          "Y14orUNDER"      
[21] "Y15to19"          "Y20to24"          "Y25to29"          "Y30to34"         
[25] "Y35to49"          "Y40to44"          "Y45to49"          "Y50to54"         
[29] "Y55to59"          "Y60to64"          "Y65orOVER"        "F"               
[33] "M"                "HH1"              "HH2"              "HH3"             
[37] "HH4"              "HH5"              "HH6"              "ADD1YagoSAME"    
[41] "ADD1YagoSTUDENT"  "ADD1YagoUK"       "ADD1YagoNONUK"    "NHH"             
[45] "OWN"              "MORTGAGE"         "SHAREDOWN"        "RENTfromCOUNCIL" 
[49] "RENTotherSOCIAL"  "RENTprivate"      "RENTprivateOTHER" "RENTfree"        

For a description of the variables in the columns of df_MSOA, we can load a dictionary for these variables:

df_dictionary <- read.csv("./data/machine-learning/Dictionary.csv")
head(df_dictionary)
                                      Dictionary       X
1                                                       
2                                           Name     Key
3                 Lives in household (% persons)    inHH
4    Lives in communal establishment (% persons)    inCE
5 Never married or civil partnership (% persons)    SING
6    Married or in civil partnership (% persons) MARRIED

Source & pre-processing

Data on the the census characteristics for different MSOAs can be downloaded from the Nomis website. Data on the average net household income can be obtained from the ONS website.

Data on the median houseprice for different MSOAs can be downloaded from the ONS website.

All the data has been pre-processed on Microsoft Excel.