Showing posts from September, 2015

Causal attribution in an era of big time-series data

by KAY BRODERSEN For the first time in the history of statistics, recent innovations in big data might allow us to estimate fine-grained causal effects, automatically and at scale. But the analytical challenges are substantial. Every idea at Google begins with a simple question. How can we predict the benefits the idea's realization would create for users, publishers, developers, or advertisers? How can we establish that there is a causal link between our idea and the outcome metric we care about? Revealing one causal law can be more powerful than describing a thousand correlations — since only a causal relationship enables us to understand the true consequences of our actions. It's why estimating causal effects has been at the heart of data science. Analyses might begin by exploring, visualizing, correlating. But ultimately, we'll often want to identify the drivers that determine why things are the way they are. The golden path to estimating a causal effect is throu

On procedural and declarative programming in MapReduce

by SEAN GERRISH and AMIR NAJMI To deliver the services our users have come to rely upon, Googlers have to process a lot of data — often at web-scale. For doing analyses quickly, it helps to abstract away as much of the repeated work as possible. In this post, we’ll describe some things we have learned about mixing declarative and procedural programing paradigms to simplify MapReduce as used by data scientists. One goal of a data scientist’s software stack is to eliminate as much routine work as possible so she can spend more time on her comparative advantage: reasoning about data. Because a data scientist’s work is more abstract than a typical software engineer’s, the languages she uses often include declarative patterns — constructs by which the analyst specifies what she wants rather than how to get it, with the framework doing magic under the hood to get her the results. There are many examples of declarative programming constructs out there for data gathering, SQL be