1  Overview

The module provides students with an introduction to the use of data science and digital footprint data to analyse human population dynamics. Established approaches to study population dynamics rely on traditional data sources, such as censuses and surveys. Digital footprint data have emerged as a novel source of information providing an opportunity to understand key societal population issues at an unprecedented temporal and spatial granularity at scale (Rowe 2021). Yet, these data represent major methodological challenges to traditional demographic approaches (Rowe 2021). Machine learning, artificial intelligence and data science approaches are needed to overcome these methodological challenges.

1.1 Aims

This module aims to:

  • provide an introduction to fundamental theories of population science;

  • introduce students to novel data and approaches to understanding population dynamics and societal change; and,

  • equip students with skills and experience to conduct population science using computational, data science approaches.

1.2 Learning Outcomes

By the end of the module, students should be able to:

  • gain an appreciation of relevant demographic theory to help interpret patterns of population change;

  • develop an understanding of the types of demographic and social science methods that are essential for interpreting and analysing digital footprint data in the context of population dynamics;

  • develop the ability to apply different methods to understand population dynamics and societal change;

  • gain an appreciation of how population science approaches can produce relevant evidence to inform policy debates;

  • develop critical awareness of modern demographic analysis and ethical considerations in the use of digital footprint data.

1.3 Feedback

Formal assessment of two computational essays. Written assignment-specific feedback will be provided within three working weeks of the submission deadline. Comments will offer an understanding of the mark awarded and identify areas which can be considered for improvement in future assignments.

Verbal face-to-face feedback. Immediate face-to-face feedback will be provided during computer, discussion and clinic sessions in interaction with staff. This will take place in all live sessions during the semester.

Online forum. Asynchronous written feedback will be provided via an online forum. Students are encouraged to contribute by asking and answering questions relating to the module content. Staff will monitor the forum Monday to Friday 9am-5pm, but it will be open to students to make contributions at all times. Response time will vary depending on the complexity of the question and staff availability.

1.4 Computational Environment

To reproduce the code in the book, you need the following software packages:

  • R-4.3.2
  • RStudio 2023.12.0-369
  • Quarto 1.4.543
  • the list of libraries in the next section

To check your version of:

  • R and libraries run sessionInfo()
  • RStudio click help on the menu bar and then About
  • Quarto check the version file in the quarto folder on your computer.

To install and update:

1.4.1 List of libraries

The list of libraries used in this book is provided below:

  • “tidyverse”,
  • “viridis”
  • “viridisLite”
  • “ggthemes”
  • “patchwork”
  • “showtext”
  • “RColorBrewer”
  • “lubridate”
  • “tmap”
  • “sjPlot”
  • “sf”
  • “sp”
  • “kableExtra”
  • “ggcorrplot”
  • “plotrix”
  • “cluster”
  • “factoextra”
  • “igraph”
  • “stringr”
  • “rpart”
  • “rpart.plot”
  • “ggplot2”
  • “Metrics”
  • “caret”
  • “randomForest”
  • “ranger”
  • “wpgpDownloadR”
  • “devtools”
  • “ggseqplot”
  • “tidytext”
  • “tm”
  • “textdata”
  • “topicmodels”
  • “RedditExtractoR”
  • “stm”
  • “dygraphs”
  • “plotly”
  • “ggpmisc”
  • “ggformula”
  • “ggimage”
  • “modelsummary”
  • “gtools”
  • “webshot”
  • “gridExtra”
  • “broom”
  • “rtweet”
  • “dplyr”
  • “ggraph”
  • “tidygraph”
  • “ggspatial”

You need to ensure you have installed the list of libraries used in this book, running the following code:

# package names
packages <- c( "tidyverse", "viridis", "viridisLite", "ggthemes", "patchwork", "showtext", "RColorBrewer", "lubridate", "tmap", "sjPlot", "sf", "sp", "kableExtra", "ggcorrplot", "plotrix", "cluster", "factoextra", "igraph", "stringr", "rpart", "rpart.plot", "ggplot2", "Metrics", "caret", "randomForest", "ranger", "devtools", "vader", "wpgpDownloadR", "ggseqplot", "tidytext", "tm", "textdata", "topicmodels", "RedditExtractoR", "stm", "dygraphs", "plotly", "ggpmisc", "ggformula", "ggimage", "modelsummary", "gtools", "gridExtra", "broom", "rtweet", "webshot", "ggraph", "tidygraph", "ggspatial")

# install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])
}

# packages loading
invisible(lapply(packages, library, character.only = TRUE))

1.5 Assessment

The final module mark is composed of the two computational essays. Together they are designed to cover the materials introduced in the entirety of content covered during the semester. A computational essay is an essay whose narrative is supported by code and computational results that are included in the essay itself. Each teaching week, you will be required to address a set of questions relating to the module content covered in that week, and to use the material that you will produce for this purpose to build your computational essay.

Assignment 1 (50%) refers to the set of questions at the end of Chapter 2, Chapter 3, Chapter 4 and Chapter 5. You are required to use your responses to build your computational essay. Each chapter provides more specific guidance of the tasks and discussion that you are required to consider in your assignment.

Assignment 2 (50%) refers to the set of questions at the end of Chapter 6, Chapter 7, Chapter 8, Chapter 9 and Chapter 10. You are required to use your responses to build your computational essay. Each chapter provides more specific guidance of the tasks and discussion that you are required to consider in your assignment.

1.5.1 Format Requirements

Both assignments will have the same requirements:

  • Maximum word count: 2,000 words, excluding figures and references.
  • Up to three maps, plot or figures (a figure may include more than one map and/or plot and will only count as one but needs to be integrated in the figure)
  • Up to two tables.

Assignments need to be prepared in “Quarto Document” format (i.e. qmd extension) and then converted into a self-contained HTML file that will then be submitted via Turnitin. The document should only display content that will be assessed. Intermediate steps do not need to be displayed. Messages resulting from loading packages, attaching data frames, or similar messages do not need to be included as output code. Useful resources to customise your R notebook can be found on Quarto’s website.

Two Quarto Document templates will be available via the module Canvas site.

Submission is electronic only via Turnitin on Canvas.

1.5.1.1 Marking criteria

The Standard Environmental Sciences School marking criteria apply, with a stronger emphasis on evidencing the use of regression models, critical analysis of results and presentation standards. In addition to these general criteria, the code and outputs (i.e. tables, maps and plots) contained within the notebook submitted for assessment will be assessed according to the extent of documentation and evidence of expertise in changing and extending the code options illustrated in each chapter. Specifically, the following criteria will be applied:

  • 0-15: no documentation and use of default options.
  • 16-39: little documentation and use of default options.
  • 40-49: some documentation, and use of default options.
  • 50-59: extensive documentation, and edit of some of the options provided in the notebook (e.g. change north arrow location).
  • 60-69: extensive well organised and easy to read documentation, and evidence of understanding of options provided in the code (e.g. tweaking existing options).
  • 70-79: all above, plus clear evidence of code design skills (e.g. customising graphics, combining plots (or tables) into a single output, adding clear axis labels and variable names on graphic outputs, etc.).
  • 80-100: all as above, plus code containing novel contributions that extend/improve the functionality the code was provided with (e.g. comparative model assessments, novel methods to perform the task, etc.).