Estimating the prevalence of rare events — theory and practice


Importance sampling is used to improve precision in estimating the prevalence of some rare event in a population. In this post, we explain how we use variants of importance sampling to estimate the prevalence of videos that violate community standards on YouTube. We also cover many practical challenges encountered in implementation when the requirement is to produce fresh and regular estimates of prevalence.

Background Every day, millions of videos are uploaded to YouTube. While most of these videos are safe for everyone to enjoy, some videos violate the community guidelines of YouTube and should be removed from the platform. There is a wide range of policy violations, from spammy videos, to videos containing nudity, to those with harassing language. We want to estimate the prevalence of violation of each individual policy category (we call them policy verticals) by sampling the videos and manually reviewing those sampled videos.

Naturally, we get an unbiased estimate of the …

Misadventures in experiments for growth


Large-scale live experimentation is a big part of online product development. In fact, this blog has published posts on this very topic. With the right experiment methodology, a product can make continuous improvements, as Google and others have done. But what works for established products may not work for a product that is still trying to find its audience. Many of the assumptions on which the "standard" experiment methodology is premised are not valid. This means a small and growing product has to use experimentation differently and very carefully. Indeed, failure to do so may cause experiments to mislead rather than guide. This blog post is about experimentation in this regime.

Established versus fledgling products For the purpose of this post, "established products" are products that have found viable segments of their target user populations, and have sustained retention among those segments. These established products fill a particular need …

Crawling the internet: data science within a large engineering system

Critical decisions are being made continuously within large software systems. Often such decisions are the responsibility of a separate machine learning (ML) system. But there are instances when having a separate ML system is not ideal. In this blog post we describe one of these instances — Google search deciding when to check if web pages have changed. Through this example, we discuss some of the special considerations impacting a data scientist when designing solutions to improve decision-making deep within software infrastructure.

Data scientists promote principled decision-making following several different arrangements. In some cases, data scientists provide executive level guidance, reporting insights and trends. Alternatively, guidance and insight may be delivered below the executive level to product managers and engineering leads, directing product feature development via metrics and A/B experiments.

This post focuses on an even lower-level pattern, when data…