Showing posts from January, 2016

Variance and significance in large-scale online services


Running live experiments on large-scale online services (LSOS) is an important aspect of data science. Unlike experimentation in some other areas, LSOS experiments present a surprising challenge to statisticians — even though we operate in the realm of “big data”, the statistical uncertainty in our experiments can be substantial. Because individual observations have so little information, statistical significance remains important to assess. We must therefore maintain statistical rigor in quantifying experimental uncertainty. In this post we explore how and why we can be “data-rich but information-poor”.

There are many reasons for the recent explosion of data and the resulting rise of data science. One big factor in putting data science on the map has been what we might call Large Scale Online Services (LSOS). These are sites and services which rely both on ubiquitous user access to the internet as well as advances in technology to scale to millions of simultaneous users.…