Why model calibration matters and how to achieve it

by LEE RICHARDSON & TAYLOR POSPISIL Calibrated models make probabilistic predictions that match real world probabilities. This post explains why calibration matters, and how to achieve it. It discusses practical issues that calibrated predictions solve and presents a flexible framework to calibrate any classifier. Calibration applies in many applications, and hence the practicing data scientist must understand this useful tool. What is calibration? At Google we make predictions for a large number of binary events such as “will a user click this ad” or “is this email spam”. In addition to the raw classification of $Y = 0$/'NotSpam' or $Y = 1$/'Spam' we are also interested in predicting the probability of the binary event $\Pr(Y = 1 | X)$ for some covariates $X$. One useful property of these predictions is calibration. To explain, let’s borrow a quote from Nate Silver’s The Signal and the Noise : One of the most important tests of a forecast — I would argue that i

Adding common sense to machine learning with TensorFlow Lattice

by TAMAN NARAYAN & SEN ZHAO A data scientist is often in possession of domain knowledge which she cannot easily apply to the structure of the model. On the one hand, basic statistical models (e.g. linear regression, trees) can be too rigid in their functional forms. On the other hand, sophisticated machine learning models are flexible in their form but not easy to control. This blog post motivates this problem more fully, and discusses monotonic splines and lattices as a solution. While the discussion is about methods and applications, the blog also contains pointers to research papers and to the TensorFlow Lattice package that provides an implementation of these solutions.  Authors of this post are part of the team at Google that builds TensorFlow Lattice. Introduction Machine learning models often behave unpredictably, as data scientists would be the first to tell you. For example, consider the following simple example — fitting a two-dimensional function to predict if someone wi

Changing assignment weights with time-based confounders

by ALEXANDER WAKIM Ramp-up and multi-armed bandits (MAB) are common strategies in online controlled experiments (OCE). These strategies involve changing assignment weights during an experiment. However, if one changes assignment weights when there are time-based confounders, then ignoring this complexity can lead to biased inference in an OCE. In the case of MABs, ignoring this complexity can also lead to poor total reward,  making it counterproductive towards its intended purpose. In this post we discuss the problem, a solution, and practical considerations. Background Online controlled experiments An online controlled experiment (OCE) randomly assigns different versions of a website or app to different users in order to see which version causes more of some desired action. In this post, these “versions” are called arms and the desired action is called the reward (arms are often called “treatments” and reward is often called the “dependent variable” in other contexts). Exampl