Exploring the London COVID Crime Effect in R

The lockdown and social distancing measures that were brought in throughout the world to tackle COVID in 2020 have had a significant, widespread effect on crime. In this notebook, I use public London crime data on robbery and burglary to examine where this “COVID crime shift” was strongest, and whether any specific drivers or correlates can be identified. I use three years of Metropolitan Police Service data from data.police.uk.

The findings suggest that the relative change in burglary and robbery in April and May 2020 was heavily affected by local characteristics: areas with a high residential population saw the sharpest decreases in burglary (likely due to a reduction in available targets) while the reduction in robberies instead seem to be driven by geographic features and indicators of deprivation (potentially suggesting more available targets for robbery in communities least able to work for from home).

The primary purpose of this exercise was to learn R - I’ve previously worked entirely in Python, which is more than sufficient 99% of the time, but has at times proved a blocker when I want to tackle some more experimental geospatial and statistical methods. With that in mind, this is likely to be a little messy, and I’ll aim to condense my main lessons into a blog post in the future. The models are not heavily tuned (aiming to explore correlates rather than provide accurate predictions) and there are likely to be correlation between our various predictors - as such these should not be taken to suggest direct causation.

The full code and data for this exercise are available on my Github repo. I’m hoping to summarise my key lessons in the Python to R journey in Medium post in the next few weeks.

Resources I’ve used


  1. Ingest Data
  2. Predict trend by MSOA
  3. Quantify MSOA COVID Effect
  4. Model

Ingest Data

For this exercise, I’ll be importing crime and robbery data by MSOA.MSOAs are geographical units specifically designed for analysis, and to be comparable: they all have an average population of just over 8,000. There is a compromise here between smaller geographical units (that create more variance that may help us identify predictors), but the necessity for enough crime per unit to identify meaningful trends - MSOAs should be suitable.

To build our process, we’ll start by taking one month of crime data, exploring it, and writing all our steps for automation.

test_df <- read.csv("crimes/2018-01/2018-01-metropolitan-street.csv")

Our crime data is categorised according to the Home Office major crime types, and like Python, we can list them all through the “unique” function. Here I’ll be focusing on robbery and burglary: two crime types that are heavily reliant on encountering victim’s in public spaces, and as such should be affected by the “COVID effect”.


To avoid this getting particularly computationally intensive, let’s write a function to pull out robberies and burglaries, and assign them a specific MSOA. Then we can iterate over all our months and get monthly counts for each offence type.

subset_df <- filter(test_df, Crime.type=="Burglary" | Crime.type=="Robbery")