1  Overview

This module offers an introduction to the field computational social science, where data science meets the study of human behaviour and societal change. Whether you are a human geographer exploring spatial dynamics or a computer scientist looking to apply cutting-edge methods to study societal challenges, this module equips you with skills to analyse data and solve problems related to a range of social science topics, including population dynamics, human mobility and demographics.

While the study of social behaviours used to rely on traditional data sources, such as censuses and surveys, digital trace data have emerged as a novel source of information providing an opportunity to understand key societal issues at an unprecedented temporal and spatial granularity at scale (Rowe 2021). Yet, these data represent major methodological challenges to traditional demographic approaches Cabrera and Rowe (2025). In this module, you will learn how to critically assess, process and model large-scale digital trace data, and how to integrate them with conventional social science data sources to produce robust and reproducible analyses. Machine learning, artificial intelligence and data science approaches are needed to overcome these methodological challenges, and the module will provide hands-on training in these methods to enable you to design, implement and interpret computational studies of social phenomena.

1.1 Aims

This module aims to:

  • provide an introduction to fundamental concepts and questions in computational social science;
  • provide students with hands-on experience in applying data science and social science methods, to analyse novel and large-scale social data; and,
  • equip students with the ability to critically evaluate data-driven analyses of societal change in relation to policy-relevant debates, ethical considerations and broader societal implications.

1.2 Learning Outcomes

By the end of the module, students should be able to:

  • understand key social science concepts and questions to interpret patterns of population dynamics and societal change;
  • apply data science and social science methods to analyse novel and large-scale social data;
  • critically evaluate how data-driven analyses of societal change inform policy-relevant debates, demonstrating awareness of ethical, legal and societal issues associated with big and novel forms of data.

1.3 Feedback

Formal assessment of two computational essays. Written assignment-specific feedback will be provided within four working weeks of the submission deadline. Comments will offer an understanding of the mark awarded and identify areas which can be considered for improvement in future assignments.

Verbal face-to-face feedback. Immediate face-to-face feedback will be provided during on-campus sessions in interaction with staff. This will take place in all live sessions during the semester.

Online forum. Asynchronous written feedback will be provided via an online forum maintained by the module lead on Microsoft Teams. Students are encouraged to contribute by asking and answering questions relating to the module content. Staff will monitor the forum Monday to Friday 9am-5pm, but it will be open to students to make contributions at all times.

1.4 Computational Environment

To reproduce the code in the book, you need the following software packages:

  • R-4.5.2
  • RStudio 2026.01.0-392
  • Quarto 1.8.27
  • the list of libraries in the next section

To check your version of:

  • R and libraries run sessionInfo()
  • RStudio click help on the menu bar and then About
  • Quarto check the version file in the quarto folder on your computer.

To install and update:

1.4.1 List of libraries

The list of libraries used in this book is provided below:

  • “tidyverse”,
  • “viridis”
  • “viridisLite”
  • “ggthemes”
  • “patchwork”
  • “showtext”
  • “RColorBrewer”
  • “lubridate”
  • “tmap”
  • “sjPlot”
  • “sf”
  • “sp”
  • “kableExtra”
  • “ggcorrplot”
  • “plotrix”
  • “cluster”
  • “factoextra”
  • “igraph”
  • “stringr”
  • “rpart”
  • “rpart.plot”
  • “ggplot2”
  • “Metrics”
  • “caret”
  • “randomForest”
  • “ranger”
  • “wpgpDownloadR”
  • “devtools”
  • “ggseqplot”
  • “tidytext”
  • “tm”
  • “textdata”
  • “topicmodels”
  • “RedditExtractoR”
  • “stm”
  • “dygraphs”
  • “plotly”
  • “ggpmisc”
  • “ggformula”
  • “ggimage”
  • “modelsummary”
  • “gtools”
  • “webshot”
  • “gridExtra”
  • “broom”
  • “rtweet”
  • “dplyr”
  • “ggraph”
  • “tidygraph”
  • “ggspatial”

You need to ensure you have installed the list of libraries used in this book, running the following code:

# package names
packages <- c( "tidyverse", "viridis", "viridisLite", "ggthemes", "patchwork", "showtext", "RColorBrewer", "lubridate", "tmap", "sjPlot", "sf", "sp", "kableExtra", "ggcorrplot", "plotrix", "cluster", "factoextra", "igraph", "stringr", "rpart", "rpart.plot", "ggplot2", "Metrics", "caret", "randomForest", "ranger", "devtools", "vader", "wpgpDownloadR", "ggseqplot", "tidytext", "tm", "textdata", "topicmodels", "RedditExtractoR", "stm", "dygraphs", "plotly", "ggpmisc", "ggformula", "ggimage", "modelsummary", "gtools", "gridExtra", "broom", "rtweet", "webshot", "ggraph", "tidygraph", "ggspatial")

# install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])
}

# packages loading
invisible(lapply(packages, library, character.only = TRUE))

1.5 Assessment

The final module mark is an average of the mark for two computational essays. These assignments are designed to cover the materials introduced during the semester. A computational essay is an essay whose narrative is supported by code and computational results that are included in the essay itself. It seeks to assess your ability to implement code, use the methods taught during class, analyse and critically interpret results, and effectively communicate complex analyses. Each teaching week, you will be required to address a set of questions relating to the module content covered in that week. You will be required to use the material produced to address each question to build your computational essay.

Assignment 1 (50%) assesses teaching content from Weeks 1 to 5. You are required to use your responses to build your computational essay. Each chapter provides more specific guidance of the tasks and discussion that you are required to consider in your assignment.

Assignment 2 (50%) assesses teaching content from Weeks 7 to 11. You are required to use your responses to build your computational essay. Each chapter provides more specific guidance of the tasks and discussion that you are required to consider in your assignment.

1.5.1 Format Requirements

Both assignments will have the same requirements:

  • Maximum word count: approximately 2,000 words, excluding figures and references. As per School Assessment Guidelines, over-length submission will be capped at 40%.
  • Up to four maps, plot or figures.
  • Up to two tables.

Assignments need to be prepared in “Quarto Document” format (i.e. qmd extension) and then converted into a self-contained HTML file that will then be submitted via Turnitin. It is very important that the quarto document is self-contained so that it renders well once submitted. The document should only display content that will be assessed. Intermediate steps do not need to be displayed. Messages resulting from loading packages, attaching data frames, or similar messages do not need to be included as output code. Useful resources to customise your R notebook can be found on Quarto’s website.

Two Quarto Document templates will be available via the module Canvas site. You can download these templates as use them for your assignments. Highly recommnded!

Submission is electronic only via Turnitin on Canvas.

1.5.1.1 Marking criteria

The General School of Environmental Sciences marking rubric (2025-26) applies, with a stronger emphasis on evidencing the use of regression models, critical analysis of results and presentation standards. In addition to these general criteria, the code and outputs (i.e. tables, maps and plots) contained within the notebook submitted for assessment will be assessed according to the extent of documentation and evidence of expertise in changing and extending the code options illustrated in each chapter. Specifically, the following criteria will be applied:

  • 0-15: no documentation and use of default options.
  • 16-39: little documentation and use of default options.
  • 40-49: some documentation, and use of default options.
  • 50-59: extensive documentation, and edit of some of the options provided in the notebook (e.g. change north arrow location).
  • 60-69: extensive well organised and easy to read documentation, and evidence of understanding of options provided in the code (e.g. tweaking existing options).
  • 70-79: all above, plus clear evidence of code design skills (e.g. customising graphics, combining plots (or tables) into a single output, adding clear axis labels and variable names on graphic outputs, etc.).
  • 80-100: all as above, plus code containing novel contributions that extend/improve the functionality the code was provided with (e.g. comparative model assessments, novel methods to perform the task, etc.).