Self-exciting point processes

A review, focusing on temporal processes modeling social media events, is Rizoiu, M.-A., Lee, Y., Mishra, S., & Xie, L. (2017). A tutorial on Hawkes proceses for events in social media. https://arxiv.org/abs/1708.06401

Properties

Hawkes, A. G., & Oakes, D. (1974). A Cluster Process Representation of a Self-Exciting Process. Journal of Applied Probability, 11(3), 493–503. doi:10.2307/3212693

A stationary self-exciting point process with finite intensity can be represented as a Poisson cluster process (aka Poisson branching process). This can be useful for establish bounds of the process – for example, Lewis, P. A. W. (1969). Asymptotic properties and equilibrium conditions for branching Poisson processes. Journal of Applied Probability, 6(2), 355–371. doi:10.1017/S0021900200032873

ETAS

The basic approach comes from Epidemic-Type Aftershock Models, where earthquakes are caused by some constant background process and then induce further aftershocks when they arrive. There’s a whole series of papers by Ogata; some highlights from the field:

Ogata, Y. (1999). Seismicity Analysis through Point-process Modeling: A Review. Pure and Applied Geophysics, 155(2-4), 471–507. doi:10.1007/s000240050275
We can approximate the log-likelihood by integrating over all space instead of just the observation region: Schoenberg, F. P. (2013). Facilitated estimation of ETAS. Bulletin of the Seismological Society of America, 103(1), 601–605. doi:10.1785/0120120146
Veen, A., & Schoenberg, F. P. (2008). Estimation of Space-Time Branching Process Models in Seismology Using an EM-Type Algorithm. Journal of the American Statistical Association, 103(482), 614–624. doi:10.1198/016214508000000148

Simulation

Thinning is a common technique, but there are better ways:

Zhuang, J., Ogata, Y., & Vere-Jones, D. (2004). Analyzing earthquake clustering features by using stochastic reconstruction. Journal of Geophysical Research, 109(B5), 1–17. doi:10.1029/2003JB002879

Proposes an algorithm that uses the Poisson cluster process representation to avoid thinning at all. Simulate the background process as a homogeneous Poisson process, then draw offspring from each event as an inhomogeneous process, then draw offspring of those events, and so on and so on… essentially a generative simulation. Can be implemented very efficiently.
Harte, D. (2017). Probability distribution of forecasts based on the ETAS model. Geophysical Journal International, 210(1), 90–104. doi:10.1093/gji/ggx146

Derivation for ETAS models showing that the number (and variance of that number) of events in a forecast interval can be approximated with some convoluted probability generating functions or a negative binomial. The “seed” events – those triggered by the background and the events occurring before the forecast interval – are known and their number of direct offspring easy to approximate, so the only term needing approximation is the events excited by those events. Perhaps the math works better with a model different than ETAS?

Epidemic/endemic models

The ETAS has been adapted to epidemiology (see also epidemic models):

Meyer, S., Elias, J., & Höhle, M. (2011). A Space-Time Conditional Intensity Model for Invasive Meningococcal Disease Occurrence. Biometrics, 68(2), 607–616. doi:10.1111/j.1541-0420.2011.01684.x

Related MSc thesis Meyer, S. (2010). Spatio-temporal infectious disease epidemiology based on point processes (PhD thesis). Ludwig-Maximilians-Universität München. https://epub.ub.uni-muenchen.de/11703/1/MA_Meyer.pdf
Swapping out kernels for spatial influence, with brief mention of scoring rules for one-ahead predictions (sect. 3.3): Meyer, S., & Held, L. (2014). Power-law models for infectious disease spread. Annals of Applied Statistics, 8(3), 1612–1639. doi:10.1214/14-AOAS743
Detecting clustering while accounting for spatial heterogeneity using the model: Meyer, S., Warnke, I., Rössler, W., & Held, L. (2016). Model-based testing for space-time interaction using point processes: An application to psychiatric hospital admissions in an urban area. Spatial and Spatio-Temporal Epidemiology, 17(C), 15–25. doi:10.1016/j.sste.2016.03.002
Chaffee, Park, Harrigan, Krebs, and Schoenberg (2017). A non-parametric Hawkes model of the spread of Ebola in West Africa. http://www.stat.ucla.edu/~frederic/papers/chaffeepark107.pdf

Preprint using a temporal point process to model Ebola transmission in Africa, comparing the results favorably to a traditional compartment model.

Stochastic declustering

A self-exciting point process can be interpreted as a Poisson cluster process, as mentioned above. It could be interesting to decluster it, meaning to remove the events which were “excited” by another, and leave only the background events which occurred spontaneously. (In the earthquake literature, this means removing the aftershocks and keeping only the main shocks.) Stochastic declustering is this procedure.

Zhuang, J., Ogata, Y., & Vere-Jones, D. (2002). Stochastic Declustering of Space-Time Earthquake Occurrences. Journal of the American Statistical Association, 97(458), 369–380. doi:10.1198/016214502760046925

A procedure based on estimating the chance that each event was stimulated (comparing the intensity contribution from every other crime to the intensity contribution from the background), then thinning the events based on these probabilities.
Zhuang, J. et al. (2004). Analyzing earthquake clustering features by using stochastic reconstruction. Journal of Geophysical Research, 109(B5), 1–17. doi:10.1029/2003JB002879

Application of this procedure to an earthquake dataset to test hypotheses about the background and clustering processes. By declustering the process, we get the links between background events and their offspring, and can compare the time and distance distributions between them to see if they match the model’s assumptions. However, this uses the model’s assumptions for the declustering process – a bit tautological, and I suspect the declustered process will look very good even if the model does not fit well at all.
Zhuang, J., & Mateu, J. (2018). A semiparametric spatiotemporal Hawkes-type point process model with periodic background for crime data. Journal of the Royal Statistical Society: Series A (Statistics in Society). doi:10.1111/rssa.12429

A method of using declustering to estimate a nonparametric model that accounts for seasonality (time of day or day of week effects) and self-excitation.

Evaluation

How do we evaluate predictions made by a self-exciting point process model?

Vere-Jones, D. (1998). Probabilities and Information Gain for Earthquake Forecasting. Computational Seismology, 30, 248–263.

Basic idea: if you’re predicting whether or not a certain type of event will occur in a certain time interval, run many simulations over that interval, calculate the probability of the event occurring in those simulations, and use a scoring rule to compare to the actual outcome. Repeat over many time intervals.

Makes an interesting point about the ETAS models: they get worse scores for background events than an ordinary Poisson process, since the Poisson process estimates a higher mean event rate to account for the clustering, and the ETAS model has a lower mean background rate and explicit clustering. Since ETAS predicts aftershocks, it’d be more fair to start evaluation periods immediately after a main shock (which does limit the usefulness for predicting main shocks…).
Harte, D., & Vere-Jones, D. (2005). The Entropy Score and its Uses in Earthquake Forecasting. Pure and Applied Geophysics, 162(6), 1229–1253. doi:10.1007/s00024-004-2667-2

Reviews the entropy score (log score) and how it can be used to evaluate predictions from point process models. The log-likelihood turns out to estimate the expected information gain per event, so likelihood ratios (on a separate test set) can be used to compare models. Goodness-of-fit tests can be done by comparing the likelihood on the test set to the likelihood on simulated datasets drawn from the model.
Daley, D. T., & Vere-Jones, D. (2004). Scoring Probability Forecasts for Point Processes: The Entropy Score and Information Gain. Journal of Applied Probability, 41, 297–312. http://www.jstor.org/stable/3215984

Renewal Hawkes processes

What if the underlying background process is not Poisson but some other general renewal process? This is much more flexible, e.g. allowing Weibull interevent arrival times and very different clustering behaviors of the background process, but computationally challenging.

Wheatley, S., Filimonov, V., & Sornette, D. (2016). The Hawkes process with renewal immigration & its estimation with an EM algorithm. Computational Statistics & Data Analysis, 94, 120–135. doi:10.1016/j.csda.2015.08.007

Introduces the RHawkes (renewal Hawkes) process and EM estimators for it; note, however, that these estimators are wrong, as shown in the paper below.
Chen, F., & Stindl, T. (2017). Direct likelihood evaluation for the renewal Hawkes process. Journal of Computational and Graphical Statistics, 1–13. doi:10.1080/10618600.2017.1341324

Offers a quadratic-time method for evaluating the likelihood and corrects several errors in the previous paper, including flaws that made the EM algorithms not converge to the true MLE. Demonstrates fits to some real datasets.