Calibration of predicted treatment probabilities
Calibration denotes the consistency between predicted probabilities
and the actual frequencies observed on a test dataset.
Calibration data points (displayed as single dots) are built by computing:
- the averagae prediction (x-axis)
- the frequency of a predicted class (y axis)
for predictions within a range of probabilities, e.g.
[0, 0.1),
[0.1, 0.2), etc. up to
[0.9, 1).
A perfectly calibrated model should have calibration data points that
are on the diagonal line.
In reality the calibration data points often don't match the diagonal line exactly
and the average distance between them measures the quality of the calibration as the calibration loss.
The calibration loss is computed as the absolute difference between the calibration data points
and the diagonal, averaged over the test set, weighted by the number
of elements used to compute each point.
A calibration curve (displayed as a solid line) is computed as a smoothed version of the calibration data points, taking into account the
(weighted) number of points in each calibration data point, with the diagonal line as a prior.
The calibration loss is
.