What’s Happened to Burglary, and does Attending Help?

policing
crime
data-science
Published

December 29, 2022

It’s nearly 2023, and in the true spirit of Christmas, I’ve used the cheese/wine filled nowhere time between the holidays to pick up an analytical side-project that’s been irking me for awhile… what’s happened to burglary, and why are we suddenly so excited about mandatory attendance? It’s a thorny question, so I figured I’d document my analysis here… All(ish) of the code is available, so this will hopefully be a useful tutorial for others. As usual, keep in mind this is a blog post, not an academic article - this analysis is fuelled by post-Christmas cheese, and peer reviewed by me after a glass of cherry, so if you want to rely on any of it, replicate it yourself first and make sure it’s right!

This took somewhat longer than I expected (hooray for Christmas breaks), so this will be a series in three parts:

  1. Exploring the data and Linear Models (you’re here!)
  2. Time Series (when I get the time)
  3. Synthetic Controls (one day in the distant future)

So come with me on an analytical adventure through time, as we cast our minds back to the halcyon days of 2015. The pound is worth $1.5 dollars again, Boris Johnson is Mayor of London, and we’re all still fondly thinking back to how great the London Olympics were.

Across policing, the effects of austerity are starting to be acutely felt - having briefly been protected, chief officers are tightening their fiscal belt: swathes of policing real estate are sold off to protect the front-line, which in some places is starting to rely on volunteers and goodwill to avoid looking rather thin.

Meanwhile, crime is changing: traditional crimes like burglary and violence continue their to fall, leading the soon-to-be Prime Minister Theresa May to tell the Police Federation to “stop crying wolf” about cuts while demand shrinks, but policing certainly isn’t feeling it, as an increase in “high-harm”, complex offences more than made up for any shortfall.

The shift towards complex offences was, in my mind, largely unnoticed by the general public, but caused a real shift in core policing demands: traditional theft and violent offences had become ever rarer, while child exploitation, sexual assault and fraud reporting skyrocketted… but the latter group of offences require a far lengthier investigation than the former! The result? More demand, as best illustrated this wonderful graph by Matt Ashby

Matt Ashby’s excellent visualisation of crime pressures by time and force

In the face of these conflicting pressures, policing did something that seemed perfectly reasonable, announcing that officers may not attend every single burglary.

Less than a decade later, we’re now on a very different course. Confidence in policing, which had previously seemed immutable, is down. Investigative performance has seemingly falling off a cliff. And so the NPCC has made the opposite commitment, pledging that every home burglary will now see police attendance, after forces who tested the approach claimed huge “dramatic” crime reductions.

But has performance really declined that quickly, and is attendance to blame? I’ll use this post to explore open police data from the last decade, and use data from pilot forces to see just how dramatic the effect is: just what’s happened to burglary, and will mandatory attendance help?

What’s Happened to Burglary?

We’ll start by exploring how burglary has changed over time - for the purpose of this analysis, we’re focusing on domestic burglary (eg, non including commercial properties), and we’ll use both police recorded crime data (eg, crimes reported to police) and the Crime Survey of England and Wales (a national survey to measure total crime) to compare trends.

While the UK benefits from crime data that is comparatively clean (I do not envy our American cousins, let alone most of our colleagues in the rest of the world), this is less true for both historical data (eg, going back more than a few years) or the Crime Survey data, which isn’t readily available to the general public: the data is spread through various archive files, websites and reports, and there really isn’t a “single source of truth” for crime counts over time.

The data for this analysis is a few hacked togther files: - Crime outcomes and crime counts by quarter, manually concatenated from the Police Open Data Tables - Crime Survey burglary counts per year, from the most recent ONS crime and justice quarterly

I’m afraid that does mean this analysis isn’t quite as reproducible as I’d like - I had written some code to scrape the Home Office data page and aggregate all the outcomes data, but it’s such a mishmash of ODS,XLSX and various formats that everything fell over and I did it manually. That said, hopefully it’s not too much of a pain to reproduce if you feel the need to.

A few caveats are worth considering:

  1. The Crime Survey can’t always be easily compared to reported crime: police crime data is legal accounting, while the crime survey is asking the general public what happened. Domestic Burglary should be comparable, but this is all a little experimental.
  2. Reproducibility and data cleaning: good analysis should be reproducible - you should be able to run it the same way, with new data. This isn’t true here because of my ugly cleaning.
  3. Outcomes aren’t instant: Crimes take time to solve, and so the most recent outcomes data will probably change. I’ve used outcomes per quarter, which should work, but the most recent data is likely to change.
Code
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import patsy
import statsmodels.api as sm
import plotly.io as pio
from linearmodels import PanelOLS

pio.templates.default = "plotly_dark"


police_crime_df = pd.read_csv("../data/external/police_recorded_crime_combined.csv")


yearly_police_df = police_crime_df[['Financial Year','Number of Offences']].groupby('Financial Year').sum().reset_index()
yearly_police_df['year'] = range(2002,2023,1)
yearly_police_df = yearly_police_df.rename(columns={"Number of Offences":'police'})

csew_count = pd.read_csv("../data/external/csew_burglary_trends.csv").rename(columns={'Domestic burglary':'csew'})



csew_count['year'] = range(2002,2021,1)
csew_count['csew'] = csew_count['csew']*1000

comparison_df = csew_count.merge(yearly_police_df,how='left',on='year').rename(columns={'csew':'Crime Survey',
                                                                                        'police':'Police Reported'})

comparison_df[['year','Crime Survey','Police Reported']].set_index('year')

Crime Survey Police Reported
year
2002 1423000 437583
2003 1357000 402345
2004 1310000 321507
2005 1059000 300517
2006 1025000 292260
2007 1006000 280696
2008 960000 284431
2009 992000 268606
2010 917000 258165
2011 1033000 245312
2012 922000 227276
2013 887000 211988
2014 781000 196554
2015 784000 194700
2016 697000 206051
2017 650000 309867
2018 691000 295556
2019 699000 268715
2020 582000 196214

The Crime Survey data is the best measure for long term crime trends - as it isn’t affected by reporting or policing practice. Taken in isolation, the most obvious takeaway here is that burglary is lower than it’s every been. If you acccept the Peelian view that “the test of police efficiency is the absence of crime and disorder”, policing is doing an excellent job. So why is police reported crime comparatively high, and why are perceptions of police performance seemingly so low?

Code
tidy_yearly = comparison_df[['year','Crime Survey','Police Reported']].melt(id_vars='year')

fig = px.line(tidy_yearly, x='year', y='value', color='variable',color_discrete_sequence=['red','blue'])

fig.update_layout(
    title_text="Domestic Burglaries, Police Reported and Crime Survey",
        legend=dict(
    orientation="h",
    yanchor="bottom",
        y=-0.2),
)

# Set x-axis title
fig.update_xaxes(title_text="Year")

# Set y-axes titles
fig.update_yaxes(title_text="Burglaries")

fig.show()

Comparing Crime Survey to Police Recorded crime counts is generally considered bad form: while a police officer might always know the difference between a robbery, a burglary, and an affray to enable a theft, that’s not something most of the public worries about.

Thankfully, that shouldn’t be a problem for domestic burglary - most people know when they’ve been burgled! That means we can produce a ratio of crime survey burglaries to police burglaries, and obtain a rough estimate of what proportion of burglaries are reported to police over time.

Code
comparison_df['Proportion of Burglaries Reported'] = comparison_df['Police Reported'] / comparison_df['Crime Survey'] * 100

tidy_yearly = comparison_df[['year','Crime Survey','Police Reported','Proportion of Burglaries Reported']].melt(id_vars='year')


# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

csew_burglary = tidy_yearly[tidy_yearly['variable'] == 'Crime Survey']
police_burglary = tidy_yearly[tidy_yearly['variable'] == 'Police Reported']
reported_ratio = tidy_yearly[tidy_yearly['variable'] == 'Proportion of Burglaries Reported']



fig.add_trace(
    go.Scatter(x=csew_burglary['year'], y=csew_burglary['value'], name="Crime Survey", line=dict(color='red')),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=police_burglary['year'], y=police_burglary['value'], name="Police Reported", line=dict(color='blue')),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=reported_ratio['year'], y=reported_ratio['value'], name="% Reported", line=dict(color='grey')),
    secondary_y=True,
)

# Add figure title
fig.update_layout(
    title_text="Crime Survey Burglaries per Year and Proportion Reported",
    legend=dict(
    orientation="h",
    yanchor="bottom",
        y=-0.2),
)

# Set x-axis title
fig.update_xaxes(title_text="Year")

# Set y-axes titles
fig.update_yaxes(title_text="Burglaries", secondary_y=False, color='black')
fig.update_yaxes(title_text="Proportion of Burglaries Reported", secondary_y=True, showgrid=False, color='grey')


fig.show()

I’d once thought burglary reporting would be consistent - it’s an emotive crime, with a well established insurance industry - and yet we definitely see variation over time.

Notice that dramatic spike in 2015? That’s the aftermath of the 2014 police crime recording scandal, which led to police crime statistics losing their designation as a national statistic, and accurate crime recording becoming a key metric for the policing regulator. To dig into this issue in more depth, I’d really recommend this blog by Gavin Hales.

What does this tell us about burglary in England & Wales over the last decade though? Two things:

  1. Domestic Burglaries are probably rarer than ever (and that’s not down to COVID)
  2. That’s not consistently reflected in police recorded crime - the likelihood of reporting seems to change over time

How Many Burglars are we Catching?

So there aren’t many burglaries…so have we gotten better at catching the rest?

Well…we’re certainly not catching any more than we used to. Whether you use charges (eg, burglars being sent to court) or “detections” (eg, including cautions, fines and similar), there are far fewer, thought it looks like that trend started long before 2015.

Code
detections = pd.read_csv('../data/interim/burglary_detections.csv',index_col=0)
detections

fig = px.line(detections, x='year', y='detections')

fig.update_xaxes(title_text="Year")

# Set y-axes titles
fig.update_yaxes(title_text="Police Detections", range=[0,60000])


fig.update_layout(
    title_text="Police Detections by Year"
)

fig.show()

That’s not hugely surprising: there are fewer burglars around to be caught. What’s perhaps more important is how many burglars we’re catching as a proportion of those remaining burglaries we know about.

I’ve visualised that below: on the left axis, all police reported burglaries (in blue), and how many lead to a positive outcome (in red), and on the right axis, the ratio of the two: an estimate of the proportion of police burglaries that are “solved”.

Code
yearly_df = detections.merge(comparison_df, how='left', on='year')
yearly_df['detected_csew_ratio'] = yearly_df['detections'] / yearly_df['Crime Survey'] * 100
yearly_df['detected_police_ratio'] = yearly_df['detections'] / yearly_df['Police Reported'] * 100

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(
    go.Scatter(x=yearly_df['year'], y=yearly_df['Police Reported'], name="Police Reported", fillcolor='blue'),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=yearly_df['year'], y=yearly_df['detections'], name="Police Detections",  fillcolor='red'),
    secondary_y=False,
)



fig.add_trace(
    go.Scatter(x=yearly_df['year'], y=yearly_df['detected_police_ratio'], name="Poportion of Police Reported Burglaries Detected", fillcolor='green', visible=True),
    secondary_y=True,
)


# Add figure title
fig.update_layout(
    title_text="Crime Survey Burglaries per Year and Proportion Reported",
    legend=dict(
    orientation="h",
    yanchor="bottom",
        y=-0.2),
)

# Set x-axis title
fig.update_xaxes(title_text="Year")

# Set y-axes titles
fig.update_yaxes(title_text="Burglaries", secondary_y=False)
fig.update_yaxes(title_text="Proportion of Burglaries Solved",range=[0,20], secondary_y=True, showgrid=False, color='green')


fig.show()

It doesn’t look good: performance drops sharply around the time we stopped attending all burglaries in 2015.

Of course, plenty of other things happened in 2015: for instance, we know there was a radical shift in recording standards, where police recorded and invesigated thousands of burglaries which they would have previously not been aware of. So how do we measure how police performance changed irrespective of those recording changes? We can replicate that chart, but instead of using police recorded burglaries, we use all burglaries, as reported by the Crime Survey. rather than just police reported burglaries.

The below chart does exactly that - click on each of the buttons to see how the baseline figure affects perceived investigatory performance.

Code
# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])


fig.add_trace(
    go.Scatter(x=yearly_df['year'], y=yearly_df['Police Reported'], name="Police Reported", fillcolor='blue'),
    secondary_y=False,
)
fig.add_trace(
    go.Scatter(x=yearly_df['year'], y=yearly_df['detections'], name="Police Detections", fillcolor='red'),
    secondary_y=False,
)



fig.add_trace(
    go.Scatter(x=yearly_df['year'], y=yearly_df['detected_police_ratio'], name="Poportion of Police Reported Burglaries Detected", fillcolor='green', visible=False),
    secondary_y=True,
)

fig.add_trace(
    go.Scatter(x=yearly_df['year'], y=yearly_df['detected_csew_ratio'], name="Poportion of Crime Survey Burglaries Detected", fillcolor='black', visible=False),
    secondary_y=True,
)

# Add figure title
fig.update_layout(
    title_text="Crime Survey Burglaries per Year and Proportion Reported"
)

# Set x-axis title
fig.update_xaxes(title_text="Year")

# Set y-axes titles
fig.update_yaxes(title_text="Burglaries", secondary_y=False)
fig.update_yaxes(title_text="Proportion of Burglaries Solved",range=[0,20], secondary_y=True, showgrid=False, color='green')


fig.update_layout(
    legend=dict(
    orientation="h",
    yanchor="bottom",
        y=-0.2),
    updatemenus=[
        dict(
            type="buttons",
            direction="right",
            active=0,
            x=1,
            y=1.2,
            bgcolor='grey',
            bordercolor='white',
            font=dict(color='white'),
            showactive=False,
            buttons=list([
                dict(label="As Proportion of Police Reported",
                     method="update",
                     args=[{"visible": [True, True, True,False]}]),
                dict(label="As Proportion of all Burglaries",
                     method="update",
                     args=[{"visible": [True, True, False,True]}])
            ]),
        )
    ])

fig.show()

Suddenly that sharp drop in 2015 is a lot less convincing. Looking at all burglaries - rather than just those reported to police - the proportion being solved has been going down, but that started happening long before 2015. Around 5% of all burglaries in the crime survey were “solved” in 2010, but by 2015, that had already dropped to around 2.5%, compared to just over 2% in 2020.

What’s the lesson here? The perception of a recent, short term drop in police investigative performance are somewhat overstated. Yes, policing is catching fewer burglars than ever, but burglaries are rarer than ever - those that remain are less likely to be caught than they used to be, but that’s a trend that started long before 2015 (it might even have started as early as 2008).

What explains that trend? We really don’t know. It’s possible that those few remaining burglars are the most persistent and sophisiticated: maybe opportunistic drug users desperate for a fix who were willing to smash your front window in a decade ago have been replaced by professional teams who will do a whole set of flats in minutes. We can’t really tell with the data we’ve got here, but it’s probably not as simple as policing just not trying hard enough.

Will Mandatory Attendance Help?

Given the above, will mandatory attendance actually reduce burglary? As far as I can tell, nobody has properly checked. The Telegraph suggests that burglaries “could fall by 60pc if officers visited every victim”, but given nobody has really rigorously tested the impact of burglary attendance, I’m not totally convinced. Leicestershire attempted to conduct a randomised control trial in 2015, but following press uproar the whole thing was shelved.

There is some good research that suggests it will make some difference: rapid attendance to crimes probably does help solve them, and finding a suspect is the most effective way of solving a burglary. If we randomly assigned some burglaries to be attended, and some to not, those we did attend would be solved more often.

That’s different to mandatory attendance though: where even if the responding officer thinks there isn’t any point (for example, if the burglary was many months ago, or very unlikely to be solved following a phone report), they’ll attend anyway. So how have some reporters calculcated that policy could cut the number of offences by half?

Unhelpfully, the researchers for the Mail and the Telegraph haven’t been very open with sharing their methodology, but it looks like they’ve looked at those forces who already have a mandatory attendance policy, and tried to predict what would happen if that had been expanded nationally.

Three forces in particular are repeatedly mentioned as having introduced mandatory attendance prior to the NPCC mandate:

  1. Northamptonshire since April 2019, though it’s worth noting these are dedicated burglary teams rather than attendance in isolation
  2. Bedfordshire Police using the codename “Operation Maze”, though again, it seems paired with dedicated burglary teams, and may in fact be attendance by forensics staff rather than police offiers. The earliest records are in early 2020 but it’s not clear at all this involves any mandatory attendance, rather than just increased resourcing.
  3. Greater Manchester Police since July 2021

The key question then is did the introduction of mandatory attendance in these three forces make any difference to burglary investigations?

To do that, we’ll calculate: - What was the detection rate for burglary in all forces? - How did the detection rate change after the introduction of mandatory attendance? - How did other, similar forces fare during the same period?

There are real limitations to doing this with public data: we are limited to quarter by quarter analysis, and we don’t actually know exactly when these policies started (or, for that matter, exactly what they include) or who else might have been doing something similar, but we can still try and identify what performance boost (if any) mandatory attendance made for these forces.

Code
theft_df = pd.read_excel("../data/external/burglary_outcomes_combined.ods", engine="odf")
qs = theft_df['Financial year'].str.slice(0,4) + "-Q" + theft_df['Financial quarter'].astype('str')

theft_df['date'] = pd.PeriodIndex(qs, freq='Q').to_timestamp()
theft_df = theft_df.sort_values(by='date',ascending=False)

burglary_df = theft_df[theft_df['Offence Subgroup'] == 'Domestic burglary'].rename(columns={'Force Name':'Force'})
burglary_df['Force'] = burglary_df['Force'].astype('string')
burglary_df
Financial year Financial quarter Force Offence Description Offence Group Offence Subgroup Offence Code Offence code expired Outcome Description Outcome Group Outcome Type Force outcomes for offences recorded in quarter Force outcomes recorded in quarter date
160811 2021/22 4 British Transport Police Attempted Burglary Residential Theft offences Domestic burglary 28F NaN Community Resolution Out-of-court (informal) 8 0.0 0.0 2021-10-01
164083 2021/22 4 Hampshire Attempted Distraction Burglary Residential Theft offences Domestic burglary 28H NaN Taken into consideration Taken into consideration 4 0.0 0.0 2021-10-01
164081 2021/22 4 Hampshire Attempted Distraction Burglary Residential Theft offences Domestic burglary 28H NaN Diversionary, educational or intervention acti... Diversionary, educational or intervention acti... 22 0.0 0.0 2021-10-01
164080 2021/22 4 Hampshire Attempted Distraction Burglary Residential Theft offences Domestic burglary 28H NaN Further investigation to support formal action... Further investigation to support formal action... 21 0.0 0.0 2021-10-01
164079 2021/22 4 Hampshire Attempted Distraction Burglary Residential Theft offences Domestic burglary 28H NaN Responsibility for further investigation trans... Responsibility for further investigation trans... 20 0.0 0.0 2021-10-01
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
49689 2017/18 1 Northamptonshire Attempted Burglary Residential Theft offences Domestic burglary 28F NaN Further investigation to support formal action... Further investigation to support formal action... 21 1.0 0.0 2017-01-01
49688 2017/18 1 Northamptonshire Attempted Burglary Residential Theft offences Domestic burglary 28F NaN Responsibility for further investigation trans... Responsibility for further investigation trans... 20 1.0 1.0 2017-01-01
49687 2017/18 1 Northamptonshire Attempted Burglary Residential Theft offences Domestic burglary 28F NaN Caution – youths Out-of-court (formal) 2 0.0 0.0 2017-01-01
49686 2017/18 1 Northamptonshire Attempted Burglary Residential Theft offences Domestic burglary 28F NaN Investigation complete – no suspect identified Investigation complete – no suspect identified 18 114.0 104.0 2017-01-01
50576 2017/18 1 South Wales Distraction Burglary Residential Theft offences Domestic burglary 28G NaN Community Resolution Out-of-court (informal) 8 0.0 0.0 2017-01-01

212960 rows × 14 columns

Given we’re looking at quarterly data, we’ll generate two temporal variables for our data:

  1. A running variable (running_var), which measures the number of quarters that have passed since the start of our dataset
  2. A quarter “categorical variable”, checking to see if there is any specific trend for burglary detection whether you’re in quarter 1, 2, 3 or 4
Code

time_series = theft_df[['date','Financial quarter']].drop_duplicates().sort_values(by=['date']).reset_index(drop=True).reset_index().rename(columns={'index':'running_var'})
time_series['quarter'] = time_series['Financial quarter'].astype('string')
time_series = time_series.drop(columns=['Financial quarter'])
time_series
running_var date quarter
0 0 2017-01-01 1
1 1 2017-04-01 2
2 2 2017-07-01 3
3 3 2017-10-01 4
4 4 2018-01-01 1
5 5 2018-04-01 2
6 6 2018-07-01 3
7 7 2018-10-01 4
8 8 2019-01-01 1
9 9 2019-04-01 2
10 10 2019-07-01 3
11 11 2019-10-01 4
12 12 2020-01-01 1
13 13 2020-04-01 2
14 14 2020-07-01 3
15 15 2020-10-01 4
16 16 2021-01-01 1
17 17 2021-04-01 2
18 18 2021-07-01 3
19 19 2021-10-01 4

For each force, we then have a count of burglaries, and a ratio of how many solved burglaries (eg, burglary investigations where a burglary is identified and dealt with by police) we have that quarter.

Code
total_q_offences = burglary_df[['date','Force','Force outcomes for offences recorded in quarter']].groupby(['date','Force']).sum().reset_index()
total_q_offences['total_offences'] = pd.to_numeric(total_q_offences['Force outcomes for offences recorded in quarter'],errors='coerce')

positive_outcomes = ['Taken into consideration',
 'Charged/Summonsed',
 'Community Resolution',
 'Cannabis/Khat Warning',
 'Penalty Notices for Disorder',
 'Caution – adults',
 'Diversionary, educational or intervention activity, resulting from the crime report, has been undertaken and it is not in the public interest to take any further action.',
 'Caution – youths']

burglary_df['is_detected'] = burglary_df['Outcome Description'].isin(positive_outcomes)

positive_burglary_df = burglary_df[burglary_df['is_detected']]

total_detected_offences = positive_burglary_df[['date','Force','Force outcomes for offences recorded in quarter']].groupby(['date','Force']).sum().reset_index()
total_detected_offences['total_detected'] = pd.to_numeric(total_detected_offences['Force outcomes for offences recorded in quarter'],errors='coerce')

force_comparison = total_q_offences[['date','Force','total_offences']].merge(total_detected_offences, how='left',on=['date','Force']).drop(columns=['Force outcomes for offences recorded in quarter'])
force_comparison['detected_rate'] = force_comparison['total_detected'] / force_comparison['total_offences']
force_comparison['detected_rate']  = force_comparison['detected_rate'].fillna(0)
force_comparison
date Force total_offences total_detected detected_rate
0 2017-01-01 Avon and Somerset 2151.0 112.0 0.052069
1 2017-01-01 Bedfordshire 1142.0 120.0 0.105079
2 2017-01-01 British Transport Police 0.0 0.0 0.000000
3 2017-01-01 Cambridgeshire 1109.0 99.0 0.089270
4 2017-01-01 Cheshire 790.0 80.0 0.101266
... ... ... ... ... ...
875 2021-10-01 Warwickshire 376.0 8.0 0.021277
876 2021-10-01 West Mercia 857.0 8.0 0.009335
877 2021-10-01 West Midlands 4061.0 93.0 0.022901
878 2021-10-01 West Yorkshire 2406.0 81.0 0.033666
879 2021-10-01 Wiltshire 265.0 8.0 0.030189

880 rows × 5 columns

Now, we identify our “treated” forces, and the period in which the policy was in place. This isn’t exactly clean, but based on the newspaper articles and press releases, I’ve picked out:

  • Bedfordshire from 2020 onwards
  • GMP from July 2021 onwards
  • Northamptonshire from April 2019 onwards
  • and of course, nationwide from October 2022

With those dates, we’ll then build our completed dataset of quarter by quarter burglary detection rates, for each force, and whether or not a mandatory attendance policy was in place in that force/quarter.

Code
northamptonshire = (force_comparison['Force'].str.contains('Northamptonshire')) & (force_comparison['date'] >= '2019-04-01')
bedfordshire = (force_comparison['Force'].str.contains('Bedfordshire')) & (force_comparison['date'] >= '2020-01-01')
manchester = (force_comparison['Force'].str.contains('Manchester')) & (force_comparison['date'] >= '2020-07-01')
everyone = (force_comparison['date'] >= '2022-10-01')


force_comparison['mandatory_attendance'] = northamptonshire | bedfordshire | manchester | everyone
force_comparison
date Force total_offences total_detected detected_rate mandatory_attendance
0 2017-01-01 Avon and Somerset 2151.0 112.0 0.052069 False
1 2017-01-01 Bedfordshire 1142.0 120.0 0.105079 False
2 2017-01-01 British Transport Police 0.0 0.0 0.000000 False
3 2017-01-01 Cambridgeshire 1109.0 99.0 0.089270 False
4 2017-01-01 Cheshire 790.0 80.0 0.101266 False
... ... ... ... ... ... ...
875 2021-10-01 Warwickshire 376.0 8.0 0.021277 False
876 2021-10-01 West Mercia 857.0 8.0 0.009335 False
877 2021-10-01 West Midlands 4061.0 93.0 0.022901 False
878 2021-10-01 West Yorkshire 2406.0 81.0 0.033666 False
879 2021-10-01 Wiltshire 265.0 8.0 0.030189 False

880 rows × 6 columns

One limitation of working by quarters is that we’re quite limited in numbers! Good inference in statistical modelling relies on detecting “variance” - eg, having enough data to identify both our effect, and general background levels, and separating out the two. Given GMP has only been running for 6 quarters, we’ll focus on the other two forces for this analysis.

Code
force_comparison[['Force','mandatory_attendance']].groupby('Force').sum().sort_values(by='mandatory_attendance', ascending=False).rename(columns={"mandatory_attendance":'Quarters of Mandatory Attendance'})
Quarters of Mandatory Attendance
Force
Northamptonshire 11
Bedfordshire 8
Greater Manchester 6
South Wales 0
Merseyside 0
Metropolitan Police 0
Norfolk 0
North Wales 0
North Yorkshire 0
Northumbria 0
Nottinghamshire 0
Avon and Somerset 0
London, City of 0
Staffordshire 0
Suffolk 0
Surrey 0
Sussex 0
Thames Valley 0
Warwickshire 0
West Mercia 0
West Midlands 0
West Yorkshire 0
South Yorkshire 0
Lincolnshire 0
Leicestershire 0
Lancashire 0
British Transport Police 0
Cambridgeshire 0
Cheshire 0
Cleveland 0
Cumbria 0
Derbyshire 0
Devon and Cornwall 0
Dorset 0
Durham 0
Dyfed-Powys 0
Essex 0
Gloucestershire 0
Gwent 0
Hampshire 0
Hertfordshire 0
Humberside 0
Kent 0
Wiltshire 0

Now we can start combining our data together, and seeing if we can observe any meaningful difference after the intervention takes place.

For meaningful comparisons, we’ll seek to compare forces to other similar forces - there’s not much point in comparing GMP to Durham. We could do that with our existing data - eg, looking at forces with similar numbers of burglaries - but helpfully, the policing regulator HMIFRS already produces “Most Similar Forces” groups, which should let us match up forces based on size, demographics and other factors relevant to performance. We’ll seperate out our group A (Bedfordshire and their group) from group B (Northamptonshire’s), though notice Kent is in both - unhelpfully, HMIC’s groups aren’t exclusive, though that’s not too much of a problem.

Code
group_a = ["Bedfordshire",
"Leicestershire",
"Nottinghamshire",
"Hertfordshire",
"Kent",
"Hampshire",
"Essex",
"South Yorkshire"]

group_b = ["Northamptonshire",
           "Cheshire",
"Derbyshire",
"Staffordshire",
"Kent",
"Avon and Somerset",
"Essex",
"Nottinghamshire"]

msf_groups = force_comparison[['Force']].drop_duplicates().reset_index(drop=True)
msf_groups['group_A'] = msf_groups['Force'].isin(group_a)
msf_groups['group_B'] = msf_groups['Force'].isin(group_b)
msf_groups
Force group_A group_B
0 Avon and Somerset False True
1 Bedfordshire True False
2 British Transport Police False False
3 Cambridgeshire False False
4 Cheshire False True
5 Cleveland False False
6 Cumbria False False
7 Derbyshire False True
8 Devon and Cornwall False False
9 Dorset False False
10 Durham False False
11 Dyfed-Powys False False
12 Essex True True
13 Gloucestershire False False
14 Greater Manchester False False
15 Gwent False False
16 Hampshire True False
17 Hertfordshire True False
18 Humberside False False
19 Kent True True
20 Lancashire False False
21 Leicestershire True False
22 Lincolnshire False False
23 London, City of False False
24 Merseyside False False
25 Metropolitan Police False False
26 Norfolk False False
27 North Wales False False
28 North Yorkshire False False
29 Northamptonshire False True
30 Northumbria False False
31 Nottinghamshire True True
32 South Wales False False
33 South Yorkshire True False
34 Staffordshire False True
35 Suffolk False False
36 Surrey False False
37 Sussex False False
38 Thames Valley False False
39 Warwickshire False False
40 West Mercia False False
41 West Midlands False False
42 West Yorkshire False False
43 Wiltshire False False

Let’s start with a simple visual observation: we look at the rate of detection per outcome for each force and group, with the start of mandatory attendance visible as the dashed vertical line.

Code
linear_comparator_df = force_comparison.merge(msf_groups, how='right', on='Force')
linear_comparator_df = linear_comparator_df[linear_comparator_df['group_A'] | linear_comparator_df['group_B']].merge(time_series, how='left', on='date')

linear_comparator_df['detected_rate'] = linear_comparator_df['detected_rate']  * 100

df_a = linear_comparator_df[linear_comparator_df['group_A']]
df_b = linear_comparator_df[linear_comparator_df['group_B']]

fig = make_subplots(rows=2, cols=1)

for force in df_a['Force'].unique():
    force_df = df_a[df_a['Force'] == force]
    if force == 'Bedfordshire':
        fig.add_trace(
            go.Scatter(
            x=force_df['date'], y=force_df['detected_rate'], name=force, line=dict(color='red', width=4,
                              dash='dash')),
            row=1, col=1
        )
    else:
        force_df = df_a[df_a['Force'] == force]
        fig.add_trace(
            go.Scatter(
            x=force_df['date'], y=force_df['detected_rate'], name=force),
            row=1, col=1
        )



for force in df_b['Force'].unique():
    force_df = df_b[df_b['Force'] == force]
    if force == 'Northamptonshire':
        fig.add_trace(
            go.Scatter(
            x=force_df['date'], y=force_df['detected_rate'], name=force, line=dict(color='blue', width=4,
                              dash='dash')),
            row=2, col=1
        )
    else:
        fig.add_trace(
            go.Scatter(
            x=force_df['date'], y=force_df['detected_rate'], name=force),
            row=2, col=1
        )


fig.add_vline(x='2020-01-01', row=1, col=1, line_dash="dash")
fig.add_vline(x='2019-04-01', row=2, col=1,line_dash="dash")

fig.update_xaxes(title_text="Date")
fig.update_layout(height=700, title_text="Detected Outcomes per Quarter and Force, and start of Mandatory Attendance")

fig.update_yaxes(title_text="Detected Rate")

fig.show()

It looks, in a word, messy. There might be some sort of effect, but it certainly isn’t pronounced enough to visually observe. Instead, let’s use some statistical modelling to try and measure the average effect.

Statistical Modelling

Linear Models with Time Effects

With all our data, we now build a statistical model, predicting the detected outcome rate as a factor of force, their comparison group, their total offences, and the period in time.

Code
linear_comparator_df['Force'] = linear_comparator_df['Force'].astype('category')
linear_comparator_df['quarter'] = linear_comparator_df['quarter'].astype('category')
linear_comparator_df['mandatory_attendance'] = linear_comparator_df['mandatory_attendance'].astype('int')


y = linear_comparator_df['detected_rate']

X = patsy.dmatrix("0 + C(Force) + total_offences + mandatory_attendance + group_A + group_B  + C(quarter) + running_var", data=linear_comparator_df, return_type='dataframe')


model = sm.OLS(y,X)

results = model.fit()

results.summary()
OLS Regression Results
Dep. Variable: detected_rate R-squared: 0.551
Model: OLS Adj. R-squared: 0.518
Method: Least Squares F-statistic: 16.44
Date: Sat, 31 Dec 2022 Prob (F-statistic): 1.59e-32
Time: 16:25:24 Log-Likelihood: -426.89
No. Observations: 260 AIC: 891.8
Df Residuals: 241 BIC: 959.4
Df Model: 18
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
C(Force)[Avon and Somerset] 3.6335 0.530 6.861 0.000 2.590 4.677
C(Force)[Bedfordshire] 3.1250 0.322 9.720 0.000 2.492 3.758
C(Force)[Cheshire] 4.9698 0.291 17.099 0.000 4.397 5.542
C(Force)[Derbyshire] 1.3595 0.322 4.226 0.000 0.726 1.993
C(Force)[Essex] -2.4339 0.274 -8.877 0.000 -2.974 -1.894
C(Force)[Hampshire] 2.8248 0.468 6.031 0.000 1.902 3.747
C(Force)[Hertfordshire] 1.4803 0.277 5.341 0.000 0.934 2.026
C(Force)[Kent] -3.5083 0.277 -12.671 0.000 -4.054 -2.963
C(Force)[Leicestershire] 2.8885 0.294 9.821 0.000 2.309 3.468
C(Force)[Northamptonshire] 0.6208 0.350 1.774 0.077 -0.068 1.310
C(Force)[Nottinghamshire] -2.1424 0.419 -5.118 0.000 -2.967 -1.318
C(Force)[South Yorkshire] 3.5578 0.705 5.045 0.000 2.169 4.947
C(Force)[Staffordshire] 3.2984 0.292 11.288 0.000 2.723 3.874
group_A[T.True] 5.7917 0.632 9.165 0.000 4.547 7.037
group_B[T.True] 5.7974 0.455 12.730 0.000 4.900 6.694
C(quarter)[T.2] -0.0319 0.231 -0.138 0.890 -0.487 0.423
C(quarter)[T.3] -1.0204 0.259 -3.937 0.000 -1.531 -0.510
C(quarter)[T.4] -1.0813 0.247 -4.370 0.000 -1.569 -0.594
total_offences -0.0013 0.000 -2.707 0.007 -0.002 -0.000
mandatory_attendance 0.7574 0.442 1.714 0.088 -0.113 1.628
running_var -0.1198 0.027 -4.481 0.000 -0.172 -0.067
Omnibus: 1.754 Durbin-Watson: 1.763
Prob(Omnibus): 0.416 Jarque-Bera (JB): 1.487
Skew: 0.174 Prob(JB): 0.475
Kurtosis: 3.128 Cond. No. 7.82e+19


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 8.11e-32. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

Our model suggests that mandatory attendance may have had, on average, a small effect on performance, this was not significant - eg, it may have been down to sheer luck.

Let’s go a bit further, and add a interaction effect between time and our force variable - eg, accounting for the fact that performance may vary differently over time force, by force.

Code
y = linear_comparator_df['detected_rate']

X = patsy.dmatrix("C(Force)*running_var + total_offences + mandatory_attendance + group_A + group_B  + C(quarter)", data=linear_comparator_df, return_type='dataframe')


model = sm.OLS(y,X)

results = model.fit()

results.summary()
OLS Regression Results
Dep. Variable: detected_rate R-squared: 0.616
Model: OLS Adj. R-squared: 0.565
Method: Least Squares F-statistic: 12.22
Date: Sat, 31 Dec 2022 Prob (F-statistic): 6.57e-33
Time: 16:25:24 Log-Likelihood: -406.76
No. Observations: 260 AIC: 875.5
Df Residuals: 229 BIC: 985.9
Df Model: 30
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 6.2968 0.805 7.827 0.000 4.712 7.882
C(Force)[T.Bedfordshire] 1.6786 0.583 2.879 0.004 0.530 2.827
C(Force)[T.Cheshire] 2.9897 0.959 3.118 0.002 1.101 4.879
C(Force)[T.Derbyshire] -1.9116 0.867 -2.205 0.028 -3.620 -0.204
C(Force)[T.Essex] -0.2824 0.500 -0.565 0.573 -1.268 0.703
C(Force)[T.Hampshire] 2.2860 0.537 4.256 0.000 1.228 3.344
C(Force)[T.Hertfordshire] -0.2522 0.513 -0.492 0.623 -1.263 0.759
C(Force)[T.Kent] -1.8168 0.490 -3.704 0.000 -2.783 -0.850
C(Force)[T.Leicestershire] -0.3087 0.483 -0.639 0.523 -1.260 0.643
C(Force)[T.Northamptonshire] -3.0584 0.941 -3.252 0.001 -4.912 -1.205
C(Force)[T.Nottinghamshire] -0.6398 0.620 -1.032 0.303 -1.862 0.582
C(Force)[T.South Yorkshire] 0.8367 0.745 1.123 0.263 -0.632 2.305
C(Force)[T.Staffordshire] 0.8295 0.902 0.919 0.359 -0.948 2.607
group_A[T.True] 1.5015 0.429 3.496 0.001 0.655 2.348
group_B[T.True] 2.0563 0.418 4.917 0.000 1.232 2.880
C(quarter)[T.2] -0.0358 0.220 -0.162 0.871 -0.470 0.399
C(quarter)[T.3] -1.0428 0.256 -4.077 0.000 -1.547 -0.539
C(quarter)[T.4] -1.1036 0.240 -4.601 0.000 -1.576 -0.631
running_var -0.0171 0.056 -0.305 0.761 -0.128 0.093
C(Force)[T.Bedfordshire]:running_var -0.1477 0.088 -1.670 0.096 -0.322 0.027
C(Force)[T.Cheshire]:running_var -0.1680 0.070 -2.388 0.018 -0.307 -0.029
C(Force)[T.Derbyshire]:running_var -0.0334 0.069 -0.487 0.627 -0.168 0.102
C(Force)[T.Essex]:running_var -0.1593 0.069 -2.320 0.021 -0.295 -0.024
C(Force)[T.Hampshire]:running_var -0.2698 0.068 -3.966 0.000 -0.404 -0.136
C(Force)[T.Hertfordshire]:running_var -0.1377 0.068 -2.032 0.043 -0.271 -0.004
C(Force)[T.Kent]:running_var -0.1106 0.068 -1.629 0.105 -0.244 0.023
C(Force)[T.Leicestershire]:running_var 0.0146 0.068 0.216 0.829 -0.119 0.148
C(Force)[T.Northamptonshire]:running_var 0.0375 0.091 0.413 0.680 -0.141 0.216
C(Force)[T.Nottinghamshire]:running_var -0.0865 0.068 -1.273 0.204 -0.220 0.047
C(Force)[T.South Yorkshire]:running_var -0.0445 0.068 -0.657 0.512 -0.178 0.089
C(Force)[T.Staffordshire]:running_var -0.1167 0.068 -1.707 0.089 -0.251 0.018
total_offences -0.0012 0.001 -2.279 0.024 -0.002 -0.000
mandatory_attendance 0.3016 0.777 0.388 0.698 -1.229 1.832
Omnibus: 3.734 Durbin-Watson: 2.028
Prob(Omnibus): 0.155 Jarque-Bera (JB): 3.443
Skew: 0.221 Prob(JB): 0.179
Kurtosis: 3.350 Cond. No. 4.44e+18


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 2.52e-29. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

No matter how we model it, we’re again not finding meaningful effects - mandatory attendance does seem to be associated with slightly higher detected rates, on average, but this isn’t statistically significant. Put simply, examining the “statistical noise” in the rest of our model, we can’t confidently say any increases aren’t just due to random luck.

As one final approach, we’ll use panel data using the LinearModels library - far from the best way to examine how effects change over time, but not a bad starter for ten.

Code
panel_data = force_comparison.set_index(["Force", "date"])
panel_data
total_offences total_detected detected_rate mandatory_attendance
Force date
Avon and Somerset 2017-01-01 2151.0 112.0 0.052069 False
Bedfordshire 2017-01-01 1142.0 120.0 0.105079 False
British Transport Police 2017-01-01 0.0 0.0 0.000000 False
Cambridgeshire 2017-01-01 1109.0 99.0 0.089270 False
Cheshire 2017-01-01 790.0 80.0 0.101266 False
... ... ... ... ... ...
Warwickshire 2021-10-01 376.0 8.0 0.021277 False
West Mercia 2021-10-01 857.0 8.0 0.009335 False
West Midlands 2021-10-01 4061.0 93.0 0.022901 False
West Yorkshire 2021-10-01 2406.0 81.0 0.033666 False
Wiltshire 2021-10-01 265.0 8.0 0.030189 False

880 rows × 4 columns

Code
mod = PanelOLS.from_formula(
    "detected_rate ~ 1 + mandatory_attendance + EntityEffects + TimeEffects", data=panel_data
)
mod.fit()
PanelOLS Estimation Summary
Dep. Variable: detected_rate R-squared: 0.0013
Estimator: PanelOLS R-squared (Between): -0.0444
No. Observations: 880 R-squared (Within): -0.0013
Date: Sat, Dec 31 2022 R-squared (Overall): -0.0059
Time: 16:25:24 Log-likelihood 1326.8
Cov. Estimator: Unadjusted
F-statistic: 1.0290
Entities: 44 P-value 0.3107
Avg Obs: 20.000 Distribution: F(1,816)
Min Obs: 20.000
Max Obs: 20.000 F-statistic (robust): 1.0290
P-value 0.3107
Time periods: 20 Distribution: F(1,816)
Avg Obs: 44.000
Min Obs: 44.000
Max Obs: 44.000
Parameter Estimates
Parameter Std. Err. T-stat P-value Lower CI Upper CI
Intercept 0.0614 0.0019 31.890 0.0000 0.0576 0.0652
mandatory_attendance 0.0155 0.0153 1.0144 0.3107 -0.0145 0.0456


F-test for Poolability: 2.3035
P-value: 0.0000
Distribution: F(62,816)

Included effects: Entity, Time
id: 0x1ecad429930

A similar pattern emerges - we see a small increase associated with mandatory attendance, but it’s non-significant - the increases are too small or too random to say the effect is associated due to attendance, rather than just random data noise.

That said, there are plenty of issues with our approach! We’ve only tried straightforward linear approaches, and given we’re relying on two forces and quarterly data, we’d struggle to detect small effects… hopefully we’ll notice something more stark when I come back to this in a few months for part 2, time series approaches!