Misadventures in experiments for growth


Large-scale live experimentation is a big part of online product development. In fact, this blog has published posts on this very topic. With the right experiment methodology, a product can make continuous improvements, as Google and others have done. But what works for established products may not work for a product that is still trying to find its audience. Many of the assumptions on which the "standard" experiment methodology is premised are not valid. This means a small and growing product has to use experimentation differently and very carefully. Indeed, failure to do so may cause experiments to mislead rather than guide. This blog post is about experimentation in this regime.

Established versus fledgling products For the purpose of this post, "established products" are products that have found viable segments of their target user populations, and have sustained retention among those segments. These established products fill a particular need …

Crawling the internet: data science within a large engineering system

Critical decisions are being made continuously within large software systems. Often such decisions are the responsibility of a separate machine learning (ML) system. But there are instances when having a separate ML system is not ideal. In this blog post we describe one of these instances — Google search deciding when to check if web pages have changed. Through this example, we discuss some of the special considerations impacting a data scientist when designing solutions to improve decision-making deep within software infrastructure.

Data scientists promote principled decision-making following several different arrangements. In some cases, data scientists provide executive level guidance, reporting insights and trends. Alternatively, guidance and insight may be delivered below the executive level to product managers and engineering leads, directing product feature development via metrics and A/B experiments.

This post focuses on an even lower-level pattern, when data…

Compliance bias in mobile experiments


Randomized experiments are invaluable in making product decisions, including on mobile apps. But what if users don't immediately uptake the new experimental version? What if their uptake rate is not uniform? We'd like to be able to make decisions without having to wait for the long tail of users to experience the treatment to which they have been assigned. This blog post provides details for how we can make inferences without waiting for complete uptake.

Background At Google, experimentation is an invaluable tool for making decisions and inference about new products and features. An experimenter, once their candidate product change is ready for testing, often needs only to write a few lines of configuration code to begin an experiment. Ready-made systems then perform standardized analyses on their work, giving a common and repeatable method of decision making. This process operates well under ideal conditions; in those applications where this process makes op…