There are several strategies to speed up kernel density estimation when N, the number of data points, is large, and we also want to evaluate the KDE at some large number N of locations.
By using k-d trees, we can efficiently handle data in something up to like 10 dimensions. The basic idea is that, for any query point, most data points will be far away, and clumps of far-away points may as well be treated as a single point instead. By carefully choosing when to make this approximation, we can enforce bounds on the error introduced. These bounds are usually better than those attained by Gauss transforms (see below), since they’re based on the data, instead of theoretical upper bounds on accuracy loss.
k-d trees were introduced by Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.
Gray, A. G., & Moore, A. W. (2003). Nonparametric density estimation: Toward computational tractability. In Proceedings of the 2003 SIAM International Conference on Data Mining (pp. 203–211). SIAM. doi:10.1137/1.9781611972733.19
High-level overview of the k-d tree method, with rather minimal pseudocode and brief descriptions.
Gray, A. G. (2003, April). Bringing tractability to generalized n-body problems in statistical and scientific computation (PhD thesis). Carnegie Mellon University. http://reports-archive.adm.cs.cmu.edu/anon/2003/CMU-CS-03-222.pdf
Gray’s thesis gives a better exposition of the algorithm.
Lang, D., Klaas, M., & Freitas, N. de (2005). Empirical Testing of Fast Kernel Density Estimation Algorithms (No. TR-2005-03). University of British Columbia. https://www.cs.ubc.ca/cgi-bin/tr/2005/TR-2005-03.pdf
Lang, D. (2004, October). Fast Methods for Inference in Graphical Models (PhD thesis). University of British Columbia. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.9783&rep=rep1&type=pdf
Lang formulates the algorithm differently (section 3.3.5), in a fashion that is more amenable to easy implementation.
Using a series expansion of the kernel instead of clever data structures. There are a zillion variations.
Lee, D., Moore, A. W., & Gray, A. G. (2005). Dual-tree fast Gauss transforms. Advances in Neural Information Processing Systems. http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2005_570.pdf
Points out an error in Greengard et al, and suggests a hybrid approach using k-d trees and Gauss transforms together.