.. _sphx_quant_intro: ============================== Introduction to quantification ============================== **Quantification** (a.k.a. *class-prevalence estimation*) asks a different question from classification. A classifier predicts the label of each individual instance; a quantifier predicts the *proportion* of each class in a whole sample — and it can be accurate even when the underlying classifier is not. This first example fits the simplest possible quantifier, :class:`~mlquantify.counting.CC` (Classify & Count), predicts the class prevalence of a test sample, and compares the estimate against the ground truth. .. plot:: import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from mlquantify.counting import CC from mlquantify.metrics import MAE # A binary problem whose test split is intentionally more imbalanced # than training, so counting alone has to work for its estimate. X, y = make_classification( n_samples=4000, n_features=20, weights=[0.5, 0.5], random_state=0, ) X_tr, X_te, y_tr, y_te = train_test_split( X, y, test_size=0.5, stratify=y, random_state=0, ) quantifier = CC(LogisticRegression(max_iter=1000)) quantifier.fit(X_tr, y_tr) pred = quantifier.predict(X_te) # dict {class: prevalence} pred = np.array([pred[c] for c in quantifier.classes_]) true = np.array([(y_te == c).mean() for c in quantifier.classes_]) # Side-by-side bars: true vs. predicted prevalence per class. fig, ax = plt.subplots(figsize=(6, 4)) x = np.arange(len(quantifier.classes_)) ax.bar(x - 0.2, true, width=0.4, label="true", color="#264653") ax.bar(x + 0.2, pred, width=0.4, label="predicted (CC)", color="#2a9d8f") ax.set_xticks(x) ax.set_xticklabels([f"class {c}" for c in quantifier.classes_]) ax.set_ylabel("Prevalence") ax.set_ylim(0, 1) ax.set_title(f"CC prevalence estimate (mean absolute error = {MAE(true, pred):.3f})") ax.legend() fig.tight_layout() The two bars almost coincide: when the test distribution resembles training, even plain counting does well. The next example shows what happens when it does *not* — and why dedicated quantifiers exist. .. seealso:: - :ref:`sphx_cc_under_shift` — the limits of plain counting. - :ref:`getting_started` — the same workflow in narrative form.