See also Policing, Predicting recidivism.
Crime tends to concentrate at places, so we find the places and direct policing. A very straightforward intervention-oriented approach.
Andresen, M. A., Linning, S. J., & Malleson, N. (2017). Crime at Places and Spatial Concentrations: Exploring the Spatial Stability of Property Crime in Vancouver BC, 2003-2013. Journal of Quantitative Criminology, 33(2), 255–275. doi:10.1007/s10940-016-9295-8
The core hypothesis, that crime is concentrated at small places, tends to come from statistics like these: “Property crime in Vancouver is highly concentrated in a small percentage of street segments and intersections, as few as 5% of street segments and intersections in 2013 depending on the crime type”.
However, 5% is less impressive when you realize there were 18,445 street segments and intersections on which crime could occur, and only 1700 or so burglaries in a given year, so a completely uniform spread of crimes could still only hit 9% or so of the map.
Weisburd, D., Bushway, S., Lum, C., & Yang, S.-M. (2004). Trajectories of crime at places: A longitudinal study of street segments in the city of Seattle. Criminology, 42(2), 283–322. doi:10.1111/j.1745-9125.2004.tb00521.x
The classic source. This shows an interesting trajectory analysis over several years, and the fundamental crime concentration claim comes from 29,849 street segments and around 100,000 crimes per year, 50% of which is contained in maybe 5% of the segments. This is interesting, but doesn’t determine if crime is more concentrated than we’d expect from simple population density and mapping reasons (e.g. some street segments never experience crime because they’re interstate on-ramps or small access roads).
Hipp, J. R., & Kim, Y.-A. (2016). Measuring Crime Concentration Across Cities of Varying Sizes: Complications Based on the Spatial and Temporal Scale Employed. Journal of Quantitative Criminology, 1–38. doi:10.1007/s10940-016-9328-3
Discusses the issue of random variation causing concentration and the possibility of measuring concentration relative to the concentration we’d expect just from a uniform spread of crime across the map. Instead of proposing a measure which does so, however, they propose metrics which try to avoid upward-biased estimates of concentration—pick the top cells from last year and see what fraction of crime is contained in them this year, for example, which tries to smooth out random variation in concentration. They claim, by linear regression against the number of crimes and the number of possible locations for crime, that this measure accounts for most of the concentration we expect from having few crimes in a large city, but I don’t find this terribly convincing.
Mohler, G. O., Brantingham, P. J., Carter, J., & Short, M. B. (2019). Reducing bias in estimates for the law of crime concentration. Journal of Quantitative Criminology. doi:10.1007/s10940-019-09404-1
Points out the problem of having fewer crimes than locations, and suggests a method that assumes each location’s crimes occur from a Poisson distribution whose mean is drawn from a Gamma distribution (making the crime counts negative binomial overall). Crime concentration can be read from the shape of the Gamma, which can be estimated from the data.
Usually clustering methods or kernel densities: pick the areas with clusters or the highest crime density. There are conflicting results on what works best, but I don’t like the metrics anyway; the PAI and RRI don’t seem to measure useful quantities, particularly when you arbitrarily choose your threshold for defining “hotspot” and don’t compare across a range of thresholds, ROC-style.
For evaluation metrics:
Adepeju, M., Rosser, G., & Cheng, T. (2016). Novel evaluation metrics for sparse spatio-temporal point process hotspot predictions - a crime case study. International Journal of Geographical Information Science, 30(11), 2133–2154. doi:10.1080/13658816.2016.1159684
On top of PAI, adds measures of compactness of hotspots, their consistency from one time period to the next, and the difference in prediction between different methods.
He, L., Páez, A., & Liu, D. (2016). Persistence of Crime Hot Spots: An Ordered Probit Analysis. Geographical Analysis, 1–20. doi:10.1111/gean.12107
Counts how frequently each block is a hotspot in a spatial scan statistic method, then tries to use covariates to predict this.
Various experiments have tested whether directing patrols to hotspots reduces crime, to generally positive results.
But trying to solve community problems may be better than just saturation patrol:
One curious trial finds increases in crime when hotspot patrols are predictable: [To read] Ariel, B., & Partridge, H. (2016). Predictable Policing: Measuring the Crime Control Benefits of Hotspots Policing at Bus Stops. Journal of Quantitative Criminology, 1–25. doi:10.1007/s10940-016-9312-y
A spatial technique to identify spatial features which lead to crime. Works by identifying risk factors (bars, foreclosures, schools, etc.), mapping these, and then seeing how well they predict crime.
The initial iteration just added up the number of risk factors, then used a logistic regression to predict presence or absence of crime: Kennedy, L. W., Caplan, J. M., & Piza, E. L. (2010). Risk Clusters, Hotspots, and Spatial Intelligence: Risk Terrain Modeling as an Algorithm for Police Resource Allocation Strategies. Journal of Quantitative Criminology, 27(3), 339–362. doi:10.1007/s10940-010-9126-2
Model selection was just “which logistic regression has the biggest slope”, which naturally biases it to the models with fewer risk factors, since their risk values have a smaller range (as just a count of present factors) and hence must have a larger slope. Variable selection used a bunch of univariate chi-squareds, and I’m dubious about using p values to decide which variable predicts best.
Then came an update which uses elastic net penalized regression to fit a Poisson model, picking the best penalty via cross-validation, then further reducing the model with stepwise regression and BIC. (Why not just adjust the penalty parameter for more sparsity?) Features were included as three binary variables for proximity (within 426, 852, or 1278 feet) and three different kernel densities (with those three bandwidths), for reasons I do not understand: Kennedy, L. W., Caplan, J. M., Piza, E. L., & Buccine-Schraeder, H. (2016). Vulnerability and Exposure to Crime: Applying Risk Terrain Modeling to the Study of Assault in Chicago. Applied Spatial Analysis and Policy, 9(4), 529–548. doi:10.1007/s12061-015-9165-z
Andrew Palmer Wheeler, “Quantifying the Local and Spatial Effects of Alcohol Outlets on Crime”, https://ssrn.com/abstract=2869198.
Uses negative binomial regression with a bunch of covariates to estimate the effect of alcohol outlets on various crimes, including burglary, which has a similar relationship to the other variables, somewhat surprisingly. Finds that different kinds of alcohol outlets have statistically indistinguishable effects on crime, suggesting it’s not just drunk people from bars causing problems but the increased traffic to liquor stores and shops as well.
Xu, J., & Griffiths, E. (2016). Shooting on the street: Measuring the spatial influence of physical features on gun violence in a bounded street network. Journal of Quantitative Criminology, 33(2), 237–253. doi:10.1007/s10940-016-9292-y
Evaluates the connection between crimes and spatial point features, like bus stops, by a cross K function, sort of a continuous spatial generalization of a Knox test. Advocates measuring distance in terms of road network shortest path distance, rather than Euclidean distance. Uses the K functions to estimate the distance of influence of each feature.
Doesn’t use the K functions to test for near repeats (see section below), as time isn’t included in the analysis.
I suspect these results are biased from not accounting for self-excitation at all, and the null hypothesis used in the K function plots is complete spatial randomness of points, which is never seriously believed to be false, even if the spatial point features are completely unrelated to the crime pattern. I’d need to see simulations showing what happens to the K function when spatial distributions of events are clustered but independent.
Crimes tend to be followed by nearby crimes, e.g. from a burglar returning to an area to try a new target.
A bunch of papers use the Knox test, a permutation test that compares the number of crimes nearby in space and time with the permutation null. Requires discrete choice of cutoffs for “nearby”, so claims of distances of effects are really claims about the power of the test. (If significance is only found within 200m, would it be found at 300m if we had more data?) Implemented in the Near Repeat Calculator, widely used.
Another approach models choice of houses to burgle with a multinomial logit, where the outcome is the choice of house: Ratcliffe, J. H., & Rengert, G. F. (2008). Near-Repeat Patterns in Philadelphia Shootings. Security Journal, 21(1-2), 58–76. doi:10.1057/palgrave.sj.8350068
Ripley’s K function provides a continuous analog of the Knox test statistic. It’s a normalized count of the average number of points within a given distance of an arbitrary event, so it’s function of distance instead of having an arbitrary cutoff; a natural space-time generalization counts the average number within a given distance and a given time. Plotting these gives a sense of the scale and decay of near-repeat effects.
Used to compare before and after stop-and-frisk events: Wooditch, A., & Weisburd, D. (2016). Using Space-Time Analysis to Evaluate Criminal Justice Programs: An Application to Stop-Question-Frisk Practices. Journal of Quantitative Criminology, 32(2), 191–213. doi:10.1007/s10940-015-9259-4
Burglaries are the most common crime studied, presumably because the theory is clear: burglars like returning to areas they’re familiar with. But this is easily confounded with spatial heterogeneity: some places are better to burgle than others, regardless of whether they were recently burgled. This seems connected to the state dependence vs. heterogeneity problem, Heckman, J. J. (1991). Identifying the hand of past: Distinguishing state dependence from heterogeneity. The American Economic Review, 81(2), 75–79. http://www.jstor.org/stable/2006829
Johnson, S. D. (2008). Repeat burglary victimisation: a tale of two theories. Journal of Experimental Criminology, 4(3), 215–240. doi:10.1007/s11292-008-9055-3
Via simulation, tries to show that heterogeneity can’t account for the entire observed effects, since it wouldn’t cause as many very rapid repeats. This doesn’t actually settle the issue: all it demonstrates is that these particular simulation models are distinguishable, not that all possible state-dependent or heterogeneous processes are distinguishable from each other.
Short, M. B., D’Orsogna, M. R., Brantingham, P. J., & Tita, G. E. (2009). Measuring and Modeling Repeat and Near-Repeat Burglary Effects. Journal of Quantitative Criminology, 25(3), 325–339. doi:10.1007/s10940-009-9068-8
Another attempt to disentangle the two effects; also claims they can be distinguished, since they lead to different distributions of inter-event times.
Ornstein, J. T., & Hammond, R. A. (2017). The burglary boost: A note on detecting contagion using the knox test. Journal of Quantitative Criminology, 33(1), 65–75. doi:10.1007/s10940-016-9281-1
Shows the problem also affects Knox tests, which confound contagion and spatial heterogeneity.
Johnson, S. D., Davies, T., Murray, A., Ditta, P., Belur, J., & Bowers, K. (2017). Evaluation of operation swordfish: A near-repeat target-hardening strategy. Journal of Experimental Criminology. doi:10.1007/s11292-017-9301-7
Experimental intervention to reduce near-repeat burglaries by providing prevention information and tools (like light timers and neighborhood watch information) to victims of burglaries and their neighbors. Found only a small, marginally significant effect on crime rates, and a small increase in satisfaction with police.
See also Self-exciting point processes.
It’d be useful to combine hotspot models and near-repeat effects. As Gorr has pointed out, hotspots can be either chronic (like the methods above try to find) or temporary, caused by, say, a new burglar hitting several houses in an area. Gorr, W. L., & Lee, Y. (2015). Early Warning System for Temporary Crime Hot Spots. Journal of Quantitative Criminology, 31(1), 25–47. doi:10.1007/s10940-014-9223-8
Mohler and colleagues have a series of papers on self-exciting models for crime, which allow both chronic hotspots and self-exciting temporary clusters:
Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P., & Tita, G. E. (2011). Self-Exciting Point Process Modeling of Crime. Journal of the American Statistical Association, 106(493), 100–108. doi:10.1198/jasa.2011.ap09546
Uses stochastic declustering to attribute crimes to either the self-exciting or chronic components of the model.
Mohler, G. O. (2014). Marked point process hotspot maps for homicide and gun crime prediction in Chicago. International Journal of Forecasting, 30(3), 491–497. doi:10.1016/j.ijforecast.2014.01.004
Treats the conditional intensity as a mixture, and uses EM, where the latent variable is which crime or background component “triggered” a crime.
Mohler, G. O., Short, M. B., Malinowski, S., Johnson, M., Tita, G. E., Bertozzi, A. L., & Brantingham, P. J. (2015). Randomized controlled field trials of predictive policing. Journal of the American Statistical Association, 110(512), 1399–1411. doi:10.1080/01621459.2015.1077710
Evaluation of a simplified version of the methods above, finding a small but nonzero crime reduction effect when used to direct patrols. Their simplified version removed spatial dependence, however, and just used within-cell histories to make predictions. It’s not clear why they made this choice.
Their methods have been adapted by others. (See also the Epidemic/endemic models section of Self-exciting point processes for application to epidemiology.)
Loeffler, C., & Flaxman, S. (2016). Is Gun Violence Contagious? arXiv. http://arxiv.org/abs/1611.06713
Applies the methods of Mohler (2014) (though with fixed background parameters, rather than learned from the data) in a Bayesian framework, with parameters estimated by Stan instead of EM. Finds that detected gunshots in Chicago self-excite over a very short distance and time scale.
There are also modeling approaches that aren’t self-exciting:
Shirota, S., & Gelfand, A. E. (2016). Space and circular time log Gaussian Cox processes with application to crime event data. arXiv. http://arxiv.org/abs/1611.08719
A strange variation—not really a self-exciting model. Based on a log-Gaussian Cox process, which adds a Gaussian process on top of the (log) conditional intensity to give additional spatiotemporal random effects. Time is circular, through 24 hour cycles, instead of linear. They do not include demographic or economic covariates, believing the available resolution to be too low to contribute to the model, but do include landmarks which serve as crime attractors, as well as indicator variables for time of day and day of week. There are no leading indicator crimes or self-excitation.
It’s not clear to me how this model connects to the criminological theory on hotspots. Crime attractors can be connected to hotspots, but the model has no notion of acute hotspots, because it has no notion of linear time. Is it good at catching new bursts? How exactly would one want to use it?
(Also, the authors make the strange decision to combine burglary and robbery events into one type, which makes little sense given the qualitative differences between the two.)
Bao Wang, Duo Zhang, Duanhao Zhang, P.Jeffery Brantingham, Andrea L. Bertozzi (2017). “Deep Learning for Real Time Crime Forecasting.” https://arxiv.org/abs/1707.03340
Because interpretability is for wimps.
Not compared to other methods for accuracy.
Crime is, naturally, affected by the weather.
A series of papers on how predictive policing interacts with the Fourth Amendment:
First, it’s surprising to see that courts already have recognized an implied Fourth Amendment exception for “high-crime areas”, which contribute to finding reasonable suspicion for a stop and search: Ferguson, A. G. (2011). Crime Mapping and the Fourth Amendment: Redrawing "High-Crime Areas". Hastings Law Journal, 63(1), 179–232. http://www.hastingslawjournal.org/2014/04/03/crime-mapping-and-the-fourth-amendment-redrawing-high-crime-areas/
Next, more on the concerns caused by data and predictive policing being used to justify searches:
Ferguson, A. G. (2012). Predictive Policing and Reasonable Suspicion. Emory Law Review, 62(2), 259–325. http://law.emory.edu/elj/content/volume-62/issue-2/articles/predicting-policing-and-reasonable-suspicion.html
Ferguson, A. G. (2015). Big Data and Predictive Reasonable Suspicion. University of Pennsylvania Law Review, 163(2), 327–410. https://ssrn.com/abstract=2394683
Ferguson, A. G. (2017). Policing Predictive Policing. Washington University Law Journal, 94. https://ssrn.com/abstract=2765525
A skeptical review of predictive policing claims, and the gradual evolution from property crime to violent crime to person-based prediction. Discusses the various dangers, such as low-quality or biased data, and criticizes claims (like PredPol’s) of effectiveness made with minimal experimental evidence. I think the skepticism is well-deserved: hotspot policing based on secret algorithms and low-quality data gives me the heebie-jeebies.
Kelly K. Koss, “Leveraging predictive policing algorithms to restore fourth amendment protections in high-crime areas in a Post-Wardlow world”, 90 Chicago-Kent Law Review 301 (2015). http://scholarship.kentlaw.iit.edu/cklawreview/vol90/iss1/12/
Explores the results of the Supreme Court’s Illinois v. Wardlow decision that being in a “high-crime area” can be a factor in establishing reasonable suspicion for a Terry stop. Proposes putting the FBI in charge of national standards for crime analysis; I’m not sure the FBI has the statistical expertise to deal with rapidly evolving methods, or expertise dealing with ordinary street crime that most police agencies would target. Given the FBI’s role in distributing Stingray surveillance devices, I’m not sure they’re interested in training police on constitutional concerns of predictive policing either.
I’d like to see an approach like NIST’s Forensic Science Center of Excellence or previous National Academies reports on forensics: convene some outside experts, not FBI agents without quantitative backgrounds.
Brayne, S. (2017). Big data surveillance: The case of policing. American Sociological Review, 82(5), 977–1008. doi:10.1177/0003122417725865
Brayne embedded herself in the LAPD for several years, conducting interviews with officers and analysts to see how they put surveillance into practice. She finds some cases where data analysis simply launders what officers already do; for example, LAPD developed a points-based system for identifying high-risk people, and being stopped by an officer for a field interview counts for points, so bias in who is stopped quickly turns into official risk scores. Teaming with Palantir, LAPD swept in massive amounts of data from varied sources (license plate readers, foreclosures, vehicle registration, warrants, social media…), and automated search tools make it easy to find everyone matching certain criteria. This also encourages “system avoidance”, where those in trouble with the law systematically avoid using medical, financial, or other institutions that keep formal records that might be used to track them.