Multiclass quantification#

Every quantifier in mlquantify handles more than two classes. Native multiclass methods (CC, PCC, EMQ, the KDEy family, the generalized matching methods) work on the full simplex directly. Binary methodsACC, HDy, DyS, SORD and the threshold-selection counters — are decomposed into a set of binary sub-problems automatically, using One-vs-Rest. The diagnostics carry over either way.

Native multiclass methods#

We start with EMQ, evaluate it across many prevalence vectors with a UPP, and show a per-class diagonal plot.

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

from mlquantify.likelihood import EMQ
from mlquantify.model_selection import apply_protocol
from mlquantify.visualization import DiagonalDisplay

X, y = make_classification(
    n_samples=4500, n_features=20, n_informative=6,
    n_classes=3, n_clusters_per_class=1, random_state=0,
)

q = EMQ(LogisticRegression(max_iter=1000))
results = apply_protocol(
    q, X, y, protocol="upp",
    n_prevalences=400, batch_size=120, random_state=0,
)

# DiagonalDisplay colour-codes the three classes on one axes for multiclass.
disp = DiagonalDisplay.from_predictions(
    results["true_prevalences"], results["predicted_prevalences"],
    alpha=0.4, s=16,
)
disp.ax_.set_title("EMQ — 3-class diagonal (one colour per class)")
disp.figure_.set_size_inches(6, 6)
disp.figure_.tight_layout()
../_images/plot_multiclass-1.png

Each class gets its own colour; all three clouds hug the diagonal, confirming EMQ recovers the full prevalence vector and not just one class.

Binary methods via One-vs-Rest#

A binary quantifier cannot, by itself, estimate three prevalences. Under One-vs-Rest the problem is split into “class \(k\) vs. the rest” for every class; each sub-quantifier estimates the prevalence of its own class, and the results are normalised to sum to one. mlquantify does this transparently: you just fit and predict the binary method exactly as in the binary case — One-vs-Rest is applied automatically, since it is the default decomposition. No manual class loop, no extra configuration.

The grid below runs four binary methods on the same three-class problem.

import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

from mlquantify.counting import ACC
from mlquantify.matching import HDy, DyS, SORD
from mlquantify.model_selection import apply_protocol
from mlquantify.visualization import DiagonalDisplay

X, y = make_classification(
    n_samples=4500, n_features=20, n_informative=6,
    n_classes=3, n_clusters_per_class=1, random_state=0,
)

# All four are binary quantifiers — they only know "positive vs. rest".
methods = {
    "ACC": ACC(LogisticRegression(max_iter=1000)),
    "HDy": HDy(LogisticRegression(max_iter=1000)),
    "DyS": DyS(LogisticRegression(max_iter=1000)),
    "SORD": SORD(LogisticRegression(max_iter=1000)),
}

fig, axes = plt.subplots(2, 2, figsize=(9, 9))
for (name, q), ax in zip(methods.items(), axes.ravel()):
    # No manual decomposition: the binary method handles 3 classes via OvR.
    results = apply_protocol(
        q, X, y, protocol="upp",
        n_prevalences=200, batch_size=120, random_state=0,
    )
    DiagonalDisplay.from_predictions(
        results["true_prevalences"], results["predicted_prevalences"],
        ax=ax, alpha=0.4, s=14,
    )
    ax.set_title(f"{name}  (One-vs-Rest)")
fig.suptitle("Binary quantifiers on a 3-class problem via OvR", y=0.99)
fig.tight_layout()
../_images/plot_multiclass-2.png

Despite being binary at heart, all four methods track the diagonal across the whole simplex: One-vs-Rest extends them to three classes with no change to your code. Each sub-quantifier estimates the prevalence of its own class, and mlquantify normalises the per-class estimates so the prediction stays a valid prevalence vector.

See also