LAPD Call Prediction for Fun (and Prophet)

Predicting LAPD police call demand using the Prophet time series forecasting library
data-science
forecasting
crime
Published

November 5, 2020

Time series prediction is more complicated than I originally anticipated when I tackled the subject during my thesis - while you can treat the events independently, like geographic data, everything is related: what happened yesterday will affect what happened today, and a Friday in July is not the same as a Monday in October.

There are various weird and wonderful algorithms to cope with these complexities, but Facebook’s open source Prophet does a fantastic job of providing a “fire and forget” solution that just works.

This is the code extract from my Medium blog here.

import pandas as pd
from fbprophet import Prophet
from fbprophet.plot import plot_plotly
import plotly.offline as py
from fbprophet.diagnostics import cross_validation
from fbprophet.diagnostics import performance_metrics
from fbprophet.plot import plot_cross_validation_metric
#import chart_studio

Data Cleaning and Aggregating

We’ll be using four years of LAPD call data, aggregated to hourly intervals. Prophet actually copes with various intervals quite well, so don’t worry too much about how you do yours: just try and keep regular intervals, without too many missings bits.

df_by_hour = pd.read_csv("/content/total_calls_per_hour.csv")
df_by_hour["ds"] = pd.date_range(min(df_by_hour["ds"]), max(df_by_hour["ds"]), freq='H')

Your final cleaned data-set must contain the below two columns, ds and y. Everything else, Prophet will deal with.

df_by_hour.head()
Dispatch Date_Dispatch Time call volume ds y
0 2015-01-01 00:00:00 286 2015-01-01 00:00:00 286
1 2015-01-01 01:00:00 265 2015-01-01 01:00:00 265
2 2015-01-01 02:00:00 179 2015-01-01 02:00:00 179
3 2015-01-01 03:00:00 152 2015-01-01 03:00:00 152
4 2015-01-01 04:00:00 127 2015-01-01 04:00:00 127
df_by_hour.tail()
Dispatch Date_Dispatch Time call volume ds y
43819 2019-12-31 19:00:00 310 2019-12-31 19:00:00 310
43820 2019-12-31 20:00:00 320 2019-12-31 20:00:00 320
43821 2019-12-31 21:00:00 373 2019-12-31 21:00:00 373
43822 2019-12-31 22:00:00 354 2019-12-31 22:00:00 354
43823 2019-12-31 23:00:00 281 2019-12-31 23:00:00 281

##Fitting and Deploying Prophet works similar to most Python sklearn type implementations - just fit the data and you’re off.

Helpfully, it will also make you a data-frame containing future dates for you to predict on. It will also provide a breakdown of seasonality trends.

m = Prophet()
m.fit(df_by_hour)
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
<fbprophet.forecaster.Prophet at 0x7fdbb146bdd8>
future = m.make_future_dataframe(periods=365)
forecast = m.predict(future)
fig1 = m.plot(forecast)

fig2 = m.plot_components(forecast)


fig = plot_plotly(m, forecast)  # This returns a plotly Figure
fig.show()

Diagnostics and Cross Validation

Helpfully, Prophet also contains cross-validation functionality - and performs quite well, very quickly!

df_cv = cross_validation(m, initial='730 days', period='180 days', horizon = '365 days')
df_cv.head()
INFO:fbprophet:Making 5 forecasts with cutoffs between 2017-01-10 23:00:00 and 2018-12-31 23:00:00
ds yhat yhat_lower yhat_upper y cutoff
0 2017-01-11 00:00:00 116.692077 89.859738 142.382745 79 2017-01-10 23:00:00
1 2017-01-11 01:00:00 95.572727 69.135035 122.850000 71 2017-01-10 23:00:00
2 2017-01-11 02:00:00 71.996010 45.771390 98.558667 55 2017-01-10 23:00:00
3 2017-01-11 03:00:00 49.332357 23.467584 76.244978 51 2017-01-10 23:00:00
4 2017-01-11 04:00:00 33.137479 4.387625 60.912735 41 2017-01-10 23:00:00
df_p = performance_metrics(df_cv)
df_p.head()
INFO:fbprophet:Skipping MAPE because y close to 0
horizon mse rmse mae mdape coverage
0 36 days 12:00:00 2938.141411 54.204625 33.287400 0.167800 0.660046
1 36 days 13:00:00 2929.579619 54.125591 33.235735 0.167594 0.660731
2 36 days 14:00:00 2919.565265 54.033002 33.187913 0.167276 0.660959
3 36 days 15:00:00 2908.693562 53.932305 33.155190 0.166688 0.661187
4 36 days 16:00:00 2910.565179 53.949654 33.167673 0.166688 0.660959
fig = plot_cross_validation_metric(df_cv, metric='rmse')