See also spatiotemporal point processes and Mutually exciting point processes.
Also known as Hawkes processes.
A review, focusing on temporal processes modeling social media events, is Rizoiu, M.-A., Lee, Y., Mishra, S., & Xie, L. (2017). A tutorial on Hawkes proceses for events in social media. https://arxiv.org/abs/1708.06401
Hawkes, A. G., & Oakes, D. (1974). A Cluster Process Representation of a Self-Exciting Process. Journal of Applied Probability, 11(3), 493–503. doi:10.2307/3212693
A stationary self-exciting point process with finite intensity can be represented as a Poisson cluster process (aka Poisson branching process). This can be useful for establish bounds of the process – for example, Lewis, P. A. W. (1969). Asymptotic properties and equilibrium conditions for branching Poisson processes. Journal of Applied Probability, 6(2), 355–371. doi:10.1017/S0021900200032873
The basic approach comes from Epidemic-Type Aftershock Models, where earthquakes are caused by some constant background process and then induce further aftershocks when they arrive. There’s a whole series of papers by Ogata; some highlights from the field:
Thinning is a common technique, but there are better ways:
Zhuang, J., Ogata, Y., & Vere-Jones, D. (2004). Analyzing earthquake clustering features by using stochastic reconstruction. Journal of Geophysical Research, 109(B5), 1–17. doi:10.1029/2003JB002879
Proposes an algorithm that uses the Poisson cluster process representation to avoid thinning at all. Simulate the background process as a homogeneous Poisson process, then draw offspring from each event as an inhomogeneous process, then draw offspring of those events, and so on and so on… essentially a generative simulation. Can be implemented very efficiently.
Harte, D. (2017). Probability distribution of forecasts based on the ETAS model. Geophysical Journal International, 210(1), 90–104. doi:10.1093/gji/ggx146
Derivation for ETAS models showing that the number (and variance of that number) of events in a forecast interval can be approximated with some convoluted probability generating functions or a negative binomial. The “seed” events – those triggered by the background and the events occurring before the forecast interval – are known and their number of direct offspring easy to approximate, so the only term needing approximation is the events excited by those events. Perhaps the math works better with a model different than ETAS?
The ETAS has been adapted to epidemiology (see also epidemic models):
Meyer, S., Elias, J., & Höhle, M. (2011). A Space-Time Conditional Intensity Model for Invasive Meningococcal Disease Occurrence. Biometrics, 68(2), 607–616. doi:10.1111/j.1541-0420.2011.01684.x
Related MSc thesis Meyer, S. (2010). Spatio-temporal infectious disease epidemiology based on point processes (PhD thesis). Ludwig-Maximilians-Universität München. https://epub.ub.uni-muenchen.de/11703/1/MA_Meyer.pdf
Swapping out kernels for spatial influence, with brief mention of scoring rules for one-ahead predictions (sect. 3.3): Meyer, S., & Held, L. (2014). Power-law models for infectious disease spread. Annals of Applied Statistics, 8(3), 1612–1639. doi:10.1214/14-AOAS743
Detecting clustering while accounting for spatial heterogeneity using the model: Meyer, S., Warnke, I., Rössler, W., & Held, L. (2016). Model-based testing for space-time interaction using point processes: An application to psychiatric hospital admissions in an urban area. Spatial and Spatio-Temporal Epidemiology, 17(C), 15–25. doi:10.1016/j.sste.2016.03.002
Chaffee, Park, Harrigan, Krebs, and Schoenberg (2017). A non-parametric Hawkes model of the spread of Ebola in West Africa. http://www.stat.ucla.edu/~frederic/papers/chaffeepark107.pdf
Preprint using a temporal point process to model Ebola transmission in Africa, comparing the results favorably to a traditional compartment model.
A self-exciting point process can be interpreted as a Poisson cluster process, as mentioned above. It could be interesting to decluster it, meaning to remove the events which were “excited” by another, and leave only the background events which occurred spontaneously. (In the earthquake literature, this means removing the aftershocks and keeping only the main shocks.) Stochastic declustering is this procedure.
Zhuang, J., Ogata, Y., & Vere-Jones, D. (2002). Stochastic Declustering of Space-Time Earthquake Occurrences. Journal of the American Statistical Association, 97(458), 369–380. doi:10.1198/016214502760046925
A procedure based on estimating the chance that each event was stimulated (comparing the intensity contribution from every other crime to the intensity contribution from the background), then thinning the events based on these probabilities.
Zhuang, J. et al. (2004). Analyzing earthquake clustering features by using stochastic reconstruction. Journal of Geophysical Research, 109(B5), 1–17. doi:10.1029/2003JB002879
Application of this procedure to an earthquake dataset to test hypotheses about the background and clustering processes. By declustering the process, we get the links between background events and their offspring, and can compare the time and distance distributions between them to see if they match the model’s assumptions. However, this uses the model’s assumptions for the declustering process – a bit tautological, and I suspect the declustered process will look very good even if the model does not fit well at all.
Zhuang, J., & Mateu, J. (2018). A semiparametric spatiotemporal Hawkes-type point process model with periodic background for crime data. Journal of the Royal Statistical Society: Series A (Statistics in Society). doi:10.1111/rssa.12429
A method of using declustering to estimate a nonparametric model that accounts for seasonality (time of day or day of week effects) and self-excitation.
How do we evaluate predictions made by a self-exciting point process model?
Vere-Jones, D. (1998). Probabilities and Information Gain for Earthquake Forecasting. Computational Seismology, 30, 248–263.
Basic idea: if you’re predicting whether or not a certain type of event will occur in a certain time interval, run many simulations over that interval, calculate the probability of the event occurring in those simulations, and use a scoring rule to compare to the actual outcome. Repeat over many time intervals.
Makes an interesting point about the ETAS models: they get worse scores for background events than an ordinary Poisson process, since the Poisson process estimates a higher mean event rate to account for the clustering, and the ETAS model has a lower mean background rate and explicit clustering. Since ETAS predicts aftershocks, it’d be more fair to start evaluation periods immediately after a main shock (which does limit the usefulness for predicting main shocks…).
Harte, D., & Vere-Jones, D. (2005). The Entropy Score and its Uses in Earthquake Forecasting. Pure and Applied Geophysics, 162(6), 1229–1253. doi:10.1007/s00024-004-2667-2
Reviews the entropy score (log score) and how it can be used to evaluate predictions from point process models. The log-likelihood turns out to estimate the expected information gain per event, so likelihood ratios (on a separate test set) can be used to compare models. Goodness-of-fit tests can be done by comparing the likelihood on the test set to the likelihood on simulated datasets drawn from the model.
What if the underlying background process is not Poisson but some other general renewal process? This is much more flexible, e.g. allowing Weibull interevent arrival times and very different clustering behaviors of the background process, but computationally challenging.
Wheatley, S., Filimonov, V., & Sornette, D. (2016). The Hawkes process with renewal immigration & its estimation with an EM algorithm. Computational Statistics & Data Analysis, 94, 120–135. doi:10.1016/j.csda.2015.08.007
Introduces the RHawkes (renewal Hawkes) process and EM estimators for it; note, however, that these estimators are wrong, as shown in the paper below.
Chen, F., & Stindl, T. (2017). Direct likelihood evaluation for the renewal Hawkes process. Journal of Computational and Graphical Statistics, 1–13. doi:10.1080/10618600.2017.1341324
Offers a quadratic-time method for evaluating the likelihood and corrects several errors in the previous paper, including flaws that made the EM algorithms not converge to the true MLE. Demonstrates fits to some real datasets.