Posts

Uncertainties: Statistical, Representational, Interventional

Image
 by AMIR NAJMI & MUKUND SUNDARARAJAN Data science is about decision making under uncertainty. Some of that uncertainty is the result of statistical inference, i.e., using a finite sample of observations for estimation. But there are other kinds of uncertainty, at least as important, that are not statistical in nature. This blog post introduces the notions of representational uncertainty and interventional uncertainty to paint a fuller picture of what the practicing data scientist is up against. Data science and uncertainty Data Science (DS) deals with data-driven decision making under uncertainty . The decisions themselves may range from "how much data center capacity should we build for two years hence?" or "does this product change benefit users?" to the very granular "what content should we recommend to this user at this moment?" This kind of decision making must address particular kinds of uncertainty. Wrestling with uncertainty characterizes the

Why model calibration matters and how to achieve it

Image
by LEE RICHARDSON & TAYLOR POSPISIL Calibrated models make probabilistic predictions that match real world probabilities. This post explains why calibration matters, and how to achieve it. It discusses practical issues that calibrated predictions solve and presents a flexible framework to calibrate any classifier. Calibration applies in many applications, and hence the practicing data scientist must understand this useful tool. What is calibration? At Google we make predictions for a large number of binary events such as “will a user click this ad” or “is this email spam”. In addition to the raw classification of $Y = 0$/'NotSpam' or $Y = 1$/'Spam' we are also interested in predicting the probability of the binary event $\Pr(Y = 1 | X)$ for some covariates $X$. One useful property of these predictions is calibration. To explain, let’s borrow a quote from Nate Silver’s The Signal and the Noise : One of the most important tests of a forecast — I would argue that i

Adding common sense to machine learning with TensorFlow Lattice

Image
by TAMAN NARAYAN & SEN ZHAO A data scientist is often in possession of domain knowledge which she cannot easily apply to the structure of the model. On the one hand, basic statistical models (e.g. linear regression, trees) can be too rigid in their functional forms. On the other hand, sophisticated machine learning models are flexible in their form but not easy to control. This blog post motivates this problem more fully, and discusses monotonic splines and lattices as a solution. While the discussion is about methods and applications, the blog also contains pointers to research papers and to the TensorFlow Lattice package that provides an implementation of these solutions.  Authors of this post are part of the team at Google that builds TensorFlow Lattice. Introduction Machine learning models often behave unpredictably, as data scientists would be the first to tell you. For example, consider the following simple example — fitting a two-dimensional function to predict if someone wi