Changing assignment weights with time-based confounders


Ramp-up and multi-armed bandits (MAB) are common strategies in online controlled experiments (OCE). These strategies involve changing assignment weights during an experiment. However, if one changes assignment weights when there are time-based confounders, then ignoring this complexity can lead to biased inference in an OCE. In the case of MABs, ignoring this complexity can also lead to poor total reward, making it counterproductive towards its intended purpose. In this post we discuss the problem, a solution, and practical considerations.

Background Online controlled experimentsAn online controlled experiment (OCE) randomly assigns different versions of a website or app to different users in order to see which version causes more of some desired action. In this post, these “versions” are called arms and the desired action is called the reward (arms are often called “treatments” and reward is often called the “dependent variable” in other contexts). Examples of rew…

Humans-in-the-loop forecasting: integrating data science and business planning


Thomas leads a team at Google called "Operations Data Science" that helps Google scale its infrastructure capacity optimally. ln this post he describes where and how having “humans in the loop” in forecasting makes sense, and reflects on past failures and successes that have led him to this perspective.

Our team does a lot of forecasting. It also owns Google’s internal time series forecasting platform described in an earlier blog post. I am sometimes asked whether there should be any role at all for "humans-in-the-loop” in forecasting. For high stakes, strategic forecasts, my answer is: yes! But this doesn't have to be an either-or choice, as I explain below.

Forecasting at the “push of a button”? In conferences and research publications, there is a lot of excitement these days about machine learning methods and forecast automation that can scale across many time series. My team and I are excited by this too (see [1] for reflections on the recent …

Estimating the prevalence of rare events — theory and practice


Importance sampling is used to improve precision in estimating the prevalence of some rare event in a population. In this post, we explain how we use variants of importance sampling to estimate the prevalence of videos that violate community standards on YouTube. We also cover many practical challenges encountered in implementation when the requirement is to produce fresh and regular estimates of prevalence.

Background Every day, millions of videos are uploaded to YouTube. While most of these videos are safe for everyone to enjoy, some videos violate the community guidelines of YouTube and should be removed from the platform. There is a wide range of policy violations, from spammy videos, to videos containing nudity, to those with harassing language. We want to estimate the prevalence of violation of each individual policy category (we call them policy verticals) by sampling the videos and manually reviewing those sampled videos.

Naturally, we get an unbiased estimate of the …