.. _sphx_calibration: ================================= Calibrating classifier posteriors ================================= Probabilistic quantifiers such as :class:`~mlquantify.likelihood.EMQ` assume the classifier's posterior probabilities are **well calibrated** — that among the samples it predicts with 70% confidence, about 70% really are positive. Many classifiers are not: Gaussian Naive Bayes, for instance, is famously over-confident when features are correlated. :class:`~mlquantify.calibration.ClassifierCalibrator` fixes this *post hoc* by rescaling the logits on a held-out split. The example below fits a deliberately over-confident Naive Bayes model, applies **Bias-Corrected Temperature Scaling** (``'bcts'``), and compares the reliability diagram and the confidence histogram before and after. .. plot:: import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.naive_bayes import GaussianNB from sklearn.model_selection import train_test_split from sklearn.calibration import calibration_curve from mlquantify.calibration import ClassifierCalibrator # Gaussian Naive Bayes is over-confident when features are correlated, # which makes it a good (mis)calibration demo. X, y = make_classification( n_samples=9000, n_features=20, n_informative=6, n_redundant=10, random_state=0, ) X_tr, X_rest, y_tr, y_rest = train_test_split(X, y, test_size=0.5, random_state=0) X_cal, X_te, y_cal, y_te = train_test_split(X_rest, y_rest, test_size=0.5, random_state=0) clf = GaussianNB().fit(X_tr, y_tr) p_te = clf.predict_proba(X_te) # Fit the calibrator on a held-out split, never on the training data. cal = ClassifierCalibrator(method="bcts").fit(y_cal, clf.predict_proba(X_cal)) p_te_cal = cal.predict(p_te) def ece(y_true, proba, n_bins=10): """Expected Calibration Error of the top-class predictions.""" conf = proba.max(axis=1) correct = (proba.argmax(axis=1) == y_true).astype(float) bins = np.linspace(0.0, 1.0, n_bins + 1) score = 0.0 for lo, hi in zip(bins[:-1], bins[1:]): m = (conf > lo) & (conf <= hi) if m.any(): score += m.mean() * abs(correct[m].mean() - conf[m].mean()) return score raw, fixed = "#e76f51", "#2a9d8f" fig, axes = plt.subplots(1, 2, figsize=(11, 4.5)) # Reliability diagram (positive class). ax = axes[0] ax.plot([0, 1], [0, 1], "k--", lw=1, label="perfectly calibrated") for proba, name, color in [ (p_te, f"GaussianNB (ECE={ece(y_te, p_te):.3f})", raw), (p_te_cal, f"+ BCTS (ECE={ece(y_te, p_te_cal):.3f})", fixed), ]: frac_pos, mean_pred = calibration_curve( y_te, proba[:, 1], n_bins=10, strategy="quantile" ) ax.plot(mean_pred, frac_pos, "o-", color=color, label=name) ax.set_xlabel("Mean predicted probability (positive class)") ax.set_ylabel("Observed frequency") ax.set_title("Reliability diagram") ax.legend(loc="upper left", fontsize=9) # Confidence histogram. ax = axes[1] ax.hist(p_te.max(axis=1), bins=20, range=(0.5, 1.0), alpha=0.6, color=raw, label="GaussianNB") ax.hist(p_te_cal.max(axis=1), bins=20, range=(0.5, 1.0), alpha=0.6, color=fixed, label="+ BCTS") ax.set_xlabel("Predicted confidence (top class)") ax.set_ylabel("Count") ax.set_title("BCTS softens over-confident scores") ax.legend(loc="upper center", fontsize=9) fig.suptitle("Post-hoc calibration with ClassifierCalibrator") fig.tight_layout() Naive Bayes pushes most of its predictions to the extremes: the orange reliability curve sits well below the diagonal (it claims more confidence than it earns), and its confidences pile up near 1.0. Bias-Corrected Temperature Scaling, fit on the calibration split, pulls the curve back onto the diagonal and spreads the confidences out — here it cuts the Expected Calibration Error roughly five-fold. Because better-calibrated posteriors directly improve EMQ, passing ``calib_function='bcts'`` to :class:`~mlquantify.likelihood.EMQ` applies exactly this step inside ``predict``. .. seealso:: - :ref:`calibration` — the Calibration user guide, with all four scaling methods (``'ts'`` / ``'bcts'`` / ``'vs'`` / ``'nbvs'``). - :class:`~mlquantify.calibration.ClassifierCalibrator` — the API reference. - :ref:`sphx_emq_convergence` — EMQ, the main consumer of calibrated posteriors.