Mind Your Units

By JEAN STEINER Randomized A/B experiments are the gold standard for estimating causal effects. The analysis can be straightforward, especially when it's safe to assume that individual observations of an outcome measure are independent. However, this is not always the case. When observations are not independent, an analysis that assumes independence can lead us to believe that effects are significant when they actually aren't. This blog post explores how this problem arises in applications at Google, and how we can 'mind our units' through analysis and logging. A tale of two units Imagine you are interested in the effectiveness of a new mathematics curriculum. You might run an experiment in which some students are exposed to the new curriculum while others are taught based on the existing curriculum. At the end of the semester, you administer a proficiency test to the students and compare test performance across the groups. For example, you might compare the aver

To Balance or Not to Balance?

By IVAN DIAZ & JOSEPH KELLY Determining the causal effects of an action—which we call treatment—on an outcome of interest is at the heart of many data analysis efforts. In an ideal world, experimentation through randomization of the treatment assignment allows the identification and consistent estimation of causal effects. In observational studies treatment is assigned by nature, therefore its mechanism is unknown and needs to be estimated. This can be done through estimation of a quantity known as the propensity score, defined as the probability of receiving treatment within strata of the observed covariates. There are two types of estimation method for propensity scores.  The first tries to predict treatment as accurately as possible.  The second tries to balance the distribution of predictors evenly between the treatment and control groups. The two approaches are related, because different predictor values among treated and control units could be used to better predict treat