.. _sphx_multiclass: ========================== Multiclass quantification ========================== Every quantifier in ``mlquantify`` handles more than two classes. Native multiclass methods (:class:`~mlquantify.counting.CC`, :class:`~mlquantify.counting.PCC`, :class:`~mlquantify.likelihood.EMQ`, the KDEy family, the generalized matching methods) work on the full simplex directly. **Binary methods** — :class:`~mlquantify.counting.ACC`, :class:`~mlquantify.matching.HDy`, :class:`~mlquantify.matching.DyS`, :class:`~mlquantify.matching.SORD` and the threshold-selection counters — are decomposed into a set of binary sub-problems automatically, using **One-vs-Rest**. The diagnostics carry over either way. Native multiclass methods -------------------------- We start with :class:`~mlquantify.likelihood.EMQ`, evaluate it across many prevalence vectors with a :class:`~mlquantify.model_selection.UPP`, and show a per-class diagonal plot. .. plot:: from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from mlquantify.likelihood import EMQ from mlquantify.model_selection import apply_protocol from mlquantify.visualization import DiagonalDisplay X, y = make_classification( n_samples=4500, n_features=20, n_informative=6, n_classes=3, n_clusters_per_class=1, random_state=0, ) q = EMQ(LogisticRegression(max_iter=1000)) results = apply_protocol( q, X, y, protocol="upp", n_prevalences=400, batch_size=120, random_state=0, ) # DiagonalDisplay colour-codes the three classes on one axes for multiclass. disp = DiagonalDisplay.from_predictions( results["true_prevalences"], results["predicted_prevalences"], alpha=0.4, s=16, ) disp.ax_.set_title("EMQ — 3-class diagonal (one colour per class)") disp.figure_.set_size_inches(6, 6) disp.figure_.tight_layout() Each class gets its own colour; all three clouds hug the diagonal, confirming EMQ recovers the full prevalence vector and not just one class. Binary methods via One-vs-Rest ------------------------------ A binary quantifier cannot, by itself, estimate three prevalences. Under **One-vs-Rest** the problem is split into "class :math:`k` vs. the rest" for every class; each sub-quantifier estimates the prevalence of its own class, and the results are normalised to sum to one. ``mlquantify`` does this transparently: **you just fit and predict the binary method exactly as in the binary case** — One-vs-Rest is applied automatically, since it is the default decomposition. No manual class loop, no extra configuration. The grid below runs four binary methods on the same three-class problem. .. plot:: import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from mlquantify.counting import ACC from mlquantify.matching import HDy, DyS, SORD from mlquantify.model_selection import apply_protocol from mlquantify.visualization import DiagonalDisplay X, y = make_classification( n_samples=4500, n_features=20, n_informative=6, n_classes=3, n_clusters_per_class=1, random_state=0, ) # All four are binary quantifiers — they only know "positive vs. rest". methods = { "ACC": ACC(LogisticRegression(max_iter=1000)), "HDy": HDy(LogisticRegression(max_iter=1000)), "DyS": DyS(LogisticRegression(max_iter=1000)), "SORD": SORD(LogisticRegression(max_iter=1000)), } fig, axes = plt.subplots(2, 2, figsize=(9, 9)) for (name, q), ax in zip(methods.items(), axes.ravel()): # No manual decomposition: the binary method handles 3 classes via OvR. results = apply_protocol( q, X, y, protocol="upp", n_prevalences=200, batch_size=120, random_state=0, ) DiagonalDisplay.from_predictions( results["true_prevalences"], results["predicted_prevalences"], ax=ax, alpha=0.4, s=14, ) ax.set_title(f"{name} (One-vs-Rest)") fig.suptitle("Binary quantifiers on a 3-class problem via OvR", y=0.99) fig.tight_layout() Despite being binary at heart, all four methods track the diagonal across the whole simplex: One-vs-Rest extends them to three classes with no change to your code. Each sub-quantifier estimates the prevalence of its own class, and ``mlquantify`` normalises the per-class estimates so the prediction stays a valid prevalence vector. .. seealso:: - :ref:`sphx_confidence_regions` — a ternary confidence ellipse for these simplex-valued predictions. - :ref:`sphx_method_comparison` — the same methods on a binary problem.