Humans-in-the-loop forecasting: integrating data science and business planning


Thomas leads a team at Google called "Operations Data Science" that helps Google scale its infrastructure capacity optimally. ln this post he describes where and how having “humans in the loop” in forecasting makes sense, and reflects on past failures and successes that have led him to this perspective.

Our team does a lot of forecasting. It also owns Google’s internal time series forecasting platform described in an earlier blog post. I am sometimes asked whether there should be any role at all for "humans-in-the-loop” in forecasting. For high stakes, strategic forecasts, my answer is: yes! But this doesn't have to be an either-or choice, as I explain below.

Forecasting at the “push of a button”? In conferences and research publications, there is a lot of excitement these days about machine learning methods and forecast automation that can scale across many time series. My team and I are excited by this too (see [1] for reflections on the recent …

Estimating the prevalence of rare events — theory and practice


Importance sampling is used to improve precision in estimating the prevalence of some rare event in a population. In this post, we explain how we use variants of importance sampling to estimate the prevalence of videos that violate community standards on YouTube. We also cover many practical challenges encountered in implementation when the requirement is to produce fresh and regular estimates of prevalence.

Background Every day, millions of videos are uploaded to YouTube. While most of these videos are safe for everyone to enjoy, some videos violate the community guidelines of YouTube and should be removed from the platform. There is a wide range of policy violations, from spammy videos, to videos containing nudity, to those with harassing language. We want to estimate the prevalence of violation of each individual policy category (we call them policy verticals) by sampling the videos and manually reviewing those sampled videos.

Naturally, we get an unbiased estimate of the …

Misadventures in experiments for growth


Large-scale live experimentation is a big part of online product development. In fact, this blog has published posts on this very topic. With the right experiment methodology, a product can make continuous improvements, as Google and others have done. But what works for established products may not work for a product that is still trying to find its audience. Many of the assumptions on which the "standard" experiment methodology is premised are not valid. This means a small and growing product has to use experimentation differently and very carefully. Indeed, failure to do so may cause experiments to mislead rather than guide. This blog post is about experimentation in this regime.

Established versus fledgling products For the purpose of this post, "established products" are products that have found viable segments of their target user populations, and have sustained retention among those segments. These established products fill a particular need …