Compliance bias in mobile experiments


Randomized experiments are invaluable in making product decisions, including on mobile apps. But what if users don't immediately uptake the new experimental version? What if their uptake rate is not uniform? We'd like to be able to make decisions without having to wait for the long tail of users to experience the treatment to which they have been assigned. This blog post provides details for how we can make inferences without waiting for complete uptake.

Background At Google, experimentation is an invaluable tool for making decisions and inference about new products and features. An experimenter, once their candidate product change is ready for testing, often needs only to write a few lines of configuration code to begin an experiment. Ready-made systems then perform standardized analyses on their work, giving a common and repeatable method of decision making. This process operates well under ideal conditions; in those applications where this process makes op…

Designing A/B tests in a collaboration network


In this article, we discuss an approach to the design of experiments in a network. In particular, we describe a method to prevent potential contamination (or inconsistent treatment exposure) of samples due to network effects. We present data from Google Cloud Platform (GCP) as an example of how we use A/B testing when users are connected. Our methodology can be extended to other areas where the network is observed and when avoiding contamination is of primary concern in experiment design. We first describe the unique challenges in designing experiments on developers working on GCP. We then use simulation to show how proper selection of the randomization unit can avoid estimation bias. This simulation is based on the actual user network of GCP.

Experimentation on networks A/B testing is a standard method of measuring the effect of changes by randomizing samples into different treatment groups. Randomization is essential to A/B testing because it removes selection bias as …

Unintentional data


A large part of the data we data scientists are asked to analyze was not collected with any specific analysis in mind, or perhaps any particular purpose at all. This post describes the analytical issues which arise in such a setting, and what the data scientist can do about them.

A landscape of promise and peril

The data scientist working today lives in what Brad Efron has termed the "era of scientific mass production," of which he remarks, "But now the flood of data is accompanied by a deluge of questions, perhaps thousands of estimates or hypothesis tests that the statistician is charged with answering together; not at all what the classical masters had in mind. [1]"

Statistics, as a discipline, was largely developed in a small data world. Data was expensive to gather, and therefore decisions to collect data were generally well-considered. Implicitly, there was a prior belief about some interesting causal mechanism or an underlying hypothesis…