An introduction to the Poisson bootstrap
by AMIR NAJMI The bootstrap is a powerful resampling procedure which makes it easy to compute the distribution of any statistical estimator. However, doing the standard bootstrap on big data (i.e. which won’t fit in the memory of a single computer) can be computationally prohibitive. In this post I describe a simple “statistical fix” to the standard bootstrap procedure allowing us to compute bootstrap estimates of standard error in a single pass or in parallel. At Google, data scientists are just too much in demand. Thus, anytime we can replace data scientist thinking with machine thinking, we consider it a win. Anticipating the ubiquity of cheap computing, Efron introduced the bootstrap back in 1979 [1]. What makes bootstrap so attractive is that it doesn’t require any parametric assumptions about the data, or any math at all, and can be applied generically to a wide variety of statistical estimators. As simple as the bootstrap procedure is, its theory is far from trivial and...