11  Data sets

library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

11.1 Greater London Authority LSOA Atlas

Availability

The dataset is stored on a csv file that can be found, within the structure of this project, under:

df_lsoa_atlas <- read.csv("data/geodemographics/lsoa-data-clean.csv")

Variables

A description of all the variables included in this dataset can be found in a dictionary csv file, which can be imported as a data frame from:

df_lsoa_atlas_dictionary <- read.csv("data/geodemographics/Dictionary-lsoa-data-clean.csv")

Source & Pre-processing

The data was sourced from London Datastore’s LSOA Atlas and cleaned on Microsoft Excel.

11.2 British administrative boundaries (LSOAs and LAs)

Availability

The dataset for the boundaries of the lower-layer super-output areas (LSOAs) within London is stored as a shapefile that can be found under:

st_LSOA <- st_read("data/geodemographics/LSOA_2011_London_gen_MHW/LSOA_2011_London_gen_MHW.shp")
Reading layer `LSOA_2011_London_gen_MHW' from data source 
  `/Users/carmen/Library/CloudStorage/OneDrive-TheUniversityofLiverpool/GitHub/r4ps/data/geodemographics/LSOA_2011_London_gen_MHW/LSOA_2011_London_gen_MHW.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 4835 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 503574.2 ymin: 155850.8 xmax: 561956.7 ymax: 200933.6
Projected CRS: OSGB36 / British National Grid

The dataset for the boundaries of the local authority distrits (LADs) for the UK is stored as a shapefile that can be found under:

LA_UK <- st_read("./data/networks/Local_Authority_Districts_(December_2022)_Boundaries_UK_BFC/LAD_DEC_2022_UK_BFC.shp")
Reading layer `LAD_DEC_2022_UK_BFC' from data source 
  `/Users/carmen/Library/CloudStorage/OneDrive-TheUniversityofLiverpool/GitHub/r4ps/data/networks/Local_Authority_Districts_(December_2022)_Boundaries_UK_BFC/LAD_DEC_2022_UK_BFC.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 374 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -116.1928 ymin: 5336.966 xmax: 655653.8 ymax: 1220302
Projected CRS: OSGB36 / British National Grid

Variables

For each of the 4,835 LSOAs, the following characteristics are available:

names(st_LSOA)
 [1] "LSOA11CD"  "LSOA11NM"  "MSOA11CD"  "MSOA11NM"  "LAD11CD"   "LAD11NM"  
 [7] "RGN11CD"   "RGN11NM"   "USUALRES"  "HHOLDRES"  "COMESTRES" "POPDEN"   
[13] "HHOLDS"    "AVHHOLDSZ" "geometry" 

where:

  • LSOA11CD: Lower-Layer Super-Output Area code
  • LSOA11NM: Lower-Layer Super-Output Area code
  • MSOA11CD: Medium-Layer Super-Output Area code
  • MSOA11NM: Medium-Layer Super-Output Area code
  • LAD11CD: Local Authority District code
  • LAD11NM: Local Authority District name
  • RGN11CD: Region code
  • RGN11NM: Region name
  • USUALRES: Usual residents
  • HHOLDRES: Household residents
  • COMESTRES: Communal Establishment residents
  • POPDEN: Population density
  • HHOLDS: Number of households
  • AVHHOLDSZ: Average household size
  • geometry: Polygon of LSOA

For each of the 374 LADs, the following characteristics are available:

names(LA_UK)
 [1] "OBJECTID"   "LAD22CD"    "LAD22NM"    "BNG_E"      "BNG_N"     
 [6] "LONG"       "LAT"        "GlobalID"   "SHAPE_Leng" "SHAPE_Area"
[11] "geometry"  

where:

  • OBJECTID: object identifier
  • LAD22CD: Local Authority District code
  • LAD22NM: Local Authority District name
  • BNG_E: Location Easting
  • BNG_N: Location Northing
  • LONG: Location Longitude
  • LAT: Location Latitude
  • GlobalID: Global Identifier
  • SHAPE_Leng: Boundary length
  • SHAPE_Area: Area within boundary
  • geometry: Polygon of LAD

Projection

The shapes of each LSOA are stored as polygons an expressed in the OSGB36 projection:

st_crs(st_LSOA)
Coordinate Reference System:
  User input: OSGB36 / British National Grid 
  wkt:
PROJCRS["OSGB36 / British National Grid",
    BASEGEOGCRS["OSGB36",
        DATUM["Ordnance Survey of Great Britain 1936",
            ELLIPSOID["Airy 1830",6377563.396,299.3249646,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6277]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",49,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-2,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.999601272,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",400000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",-100000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

Similarly, the shapes of each LAD are stored as polygons an expressed in the OSGB36 projection:

st_crs(LA_UK)
Coordinate Reference System:
  User input: OSGB36 / British National Grid 
  wkt:
PROJCRS["OSGB36 / British National Grid",
    BASEGEOGCRS["OSGB36",
        DATUM["Ordnance Survey of Great Britain 1936",
            ELLIPSOID["Airy 1830",6377563.396,299.3249646,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4277]],
    CONVERSION["British National Grid",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",49,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-2,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996012717,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",400000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",-100000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
        BBOX[49.75,-9,61.01,2.01]],
    ID["EPSG",27700]]

Source & Pre-processing

The boundaries for the LSOAs within London can be found directly from the London Datastore website.

The boundaries for the LADs for the UK can be found on the ONS Open Geography Portal website. To filter for the London LADs, i.e. the London boroughs, we run the following line of code:

LND_boroughs <- LA_UK %>% filter(grepl('E09', LAD22CD)) 

11.3 Worldpop population count data for Ukraine

11.4 Census population count data for UK

11.5 Ukraine’s administrative boundaries

11.6 Internal migration flows between US metropolitan areas and between London boroughs

Availability

The dataset for the migration flows between US metropolitan areas can be found as a csv file under:

df_metro <- read.csv("./data/networks/metro_to_metro_2015_2019_US_migration.csv")

The dataset for the migration flows between London boroughs can be found as a csv file under:

df_borough <- read.csv("./data/networks/LA_to_LA_2019_London_clean.csv")

Variables

For each of the 52,930 movements recorded on the dataset for the migration flows between US metropolitan areas, the following fields are available:

names(df_metro)
 [1] "MSA_Current_Code"                                
 [2] "MSA_Current_Name"                                
 [3] "MSA_Current_State"                               
 [4] "MSA_Current_Population_1_Year_and_Over_Estimate" 
 [5] "MSA_Current_Population_1_Year_and_Over_MOE"      
 [6] "MSA_Previous_Code"                               
 [7] "MSA_Previous_Name"                               
 [8] "MSA_Previous_State"                              
 [9] "MSA_Previous_Population_1_Year_and_Over_Estimate"
[10] "MSA_Previous_Population_1_Year_and_Over_MOE"     
[11] "Movers_Metro_to_Metro_Flow_Estimate"             
[12] "Movers_Metro_to_Metro_Flow_MOE"                  

All the fields that start with MSA_Current_ or MSA_Previous_ refer to the characteristics of the origin and destination metropolican areas. The relevant fields for the analysis in this book are:

  • Movers_Metro_to_Metro_Flow_Estimate: Estimate of number of people moving between origin and destination
  • Movers_Metro_to_Metro_Flow_MOE: Margin of error for the above estimate

More details on the methodology to obtain the estimates and the margin of error for each population movement can be found on the US Census Bureau website.

For each of the 1,053 movements recorded on the dataset for the migration flows between London boroughs, the following fields are available:

names(df_borough)
[1] "OutLA" "InLA"  "Moves"

where:

  • OutLA is the code corresponding to the origin borough
  • InLA is the code corresponding to the destination borough
  • Moves is the number of internal migration moves within each flow. Note that the numbers are not integers. This is because of the various scaling processes used to produce the dataset, which are described in more detail in the latest methodology document, which can be found here.

Source & pre-processing

The dataset for the migration flows between US metropolitan areas can be downloaded from the US Census Bureau website. The data was cleaned on Microsoft Excel.

The dataset for the migration flows between London boroughs can be downloaded from the ONS website. The data was cleaned on Microsoft Excel.

11.7 Twitter data on public opinion originated in the US and in the UK

11.8 Reddit data

11.9 Google mobility data for Italy and the UK

11.10 COVID-19 cases data for London and Rome

11.11 Census MSOA data for England and Wales

Availability

The dataset for the demographic census data of each MSOA in England and Wales can be loaded as a csv file from:

df_MSOA <- read.csv("./data/machine-learning/census2021-msoa-income.csv")

A very similar dataset for the demographic census data of each MSOA in England and Wales which also contains data on the median house price can be loaded as a csv file from:

df_housing <- read.csv("./data/machine-learning/census2021-msoa-houseprice.csv")

Variables

For each of the 7,080 MSOAs recorded in England and Wales, the following fields are available:

names(df_MSOA)
 [1] "X"                "date"             "geography"        "geography.code"  
 [5] "inHH"             "inCE"             "SING"             "MARRIED"         
 [9] "SEP"              "DIV"              "WIDOW"            "UK"              
[13] "EU"               "AFR"              "AS"               "AM"              
[17] "OC"               "BO"               "DENSITY"          "Y14orUNDER"      
[21] "Y15to19"          "Y20to24"          "Y25to29"          "Y30to34"         
[25] "Y35to49"          "Y40to44"          "Y45to49"          "Y50to54"         
[29] "Y55to59"          "Y60to64"          "Y65orOVER"        "F"               
[33] "M"                "HH1"              "HH2"              "HH3"             
[37] "HH4"              "HH5"              "HH6"              "ADD1YagoSAME"    
[41] "ADD1YagoSTUDENT"  "ADD1YagoUK"       "ADD1YagoNONUK"    "NHH"             
[45] "OWN"              "MORTGAGE"         "SHAREDOWN"        "RENTfromCOUNCIL" 
[49] "RENTotherSOCIAL"  "RENTprivate"      "RENTprivateOTHER" "RENTfree"        
[53] "INCOME"          

For a description of the variables in the columns of df_MSOA, we can load a dictionary for these variables:

df_dictionary <- read.csv("./data/machine-learning/Dictionary.csv")
head(df_dictionary)
                                      Dictionary       X
1                                                       
2                                           Name     Key
3                 Lives in household (% persons)    inHH
4    Lives in communal establishment (% persons)    inCE
5 Never married or civil partnership (% persons)    SING
6    Married or in civil partnership (% persons) MARRIED

Source & pre-processing

Data on the the census characteristics for different MSOAs can be downloaded from the Nomis website. Data on the average net household income can be obtained from the ONS website.

Data on the median houseprice for different MSOAs can be downloaded from the ONS website.

All the data has been pre-processed on Microsoft Excel.