Scoring rules

Decomposing general scores as simple expectations

Previously, we decomposed the Brier score into simple expectations. We saw that, unlike the posterior decomposition, where we get the standard variance-bias decomposition for mean squared error, the prior decomposition splits into three terms of similar form. For a binary (Bernoulli) outcome $Y$, data $X$, and scoring rule $S(p, y) := (p - y)^2$, where $p(X)$ is the predicted probability that $Y = 1$, we found that $$\begin{align*} \mathbb{E}(S(p, Y)) &= \underbrace{\mathbb{E}(\mathrm{Var}(Y|X))}_{\textrm{refinement}} + \underbrace{\mathbb{E}((p(X) - \mathbb{E}(Y|X))^2)}_{\textrm{calibration}\ldots}\\ &= \underbrace{\mathrm{Var}(Y)}_\textrm{uncertainty} - \underbrace{\mathrm{Var}(\mathbb{E}(Y|X))}_{\textrm{resolution}} + \underbrace{\mathbb{E}((p(X) - \mathbb{E}(Y|X))^2)}_{\ldots\textrm{ AKA reliability}}.

Decomposing the Brier score as simple expectations

There are plenty of articles out there on decomposing the Brier score, but they’re usually done in notation that’s non-standard for probability and statistics. I give the decomposition in more standard notation, for the simple case where we predict a binary outcome. I use expectations, instead of the empirical means that would be calculated in practice, since this simplifies the notation. Converting them to empirical means is straightforward. Posterior / conditional decomposition Suppose we are predicting the value of some variable $Y$, that can take values of 0 (failure) and 1 (success).