13.2.1. Multiclass Quantification#

Several quantifiers are binary by design — they estimate the prevalence of a positive class against everything else (e.g. the threshold-adjustment methods ACC/TAC, and the mixture models DyS, HDy, SORD). To apply them to a problem with more than two classes, mlquantify automatically decomposes the task into binary sub-problems, runs the quantifier on each, and recombines the results into a single prevalence vector over all classes.

This page covers the two built-in decomposition strategies (One-vs-Rest and One-vs-One), how to choose between them, how to make your own quantifier binary, and how to register a brand-new strategy.


13.2.1.1. Setting the strategy on a binary method#

Binary methods expose a strategy parameter. Pass it to the constructor, or set it afterwards as an attribute — both are equivalent:

from mlquantify.matching import DyS
from sklearn.linear_model import LogisticRegression

# via the constructor
q = DyS(LogisticRegression(), strategy="ovo")

# or as an attribute
q = DyS(LogisticRegression())
q.strategy = "ovo"

q.fit(X_train, y_train)        # 4-class data
q.predict(X_test)
# {0: 0.26, 1: 0.24, 2: 0.25, 3: 0.25}

If the data has only two classes the decomposition is skipped entirely and the method runs natively — the strategy is only consulted for > 2 classes.

Note

Native-multiclass methods (e.g. GACC, KDEyML, EDy, EMQ) do not decompose and ignore strategy; they model all classes jointly.


13.2.1.2. One-vs-Rest vs One-vs-One#

Strategy

One-vs-Rest ('ovr', default)

One-vs-One ('ovo')

Sub-problems

One per class: class c vs the rest (K binary quantifiers).

One per class pair (K(K-1)/2 binary quantifiers).

Recombination

Each binary quantifier gives the prevalence of its class; the vector is normalised to sum to 1.

Each pair gives the prevalence of one class against the other; per-class estimates are averaged over all pairs the class appears in.

Cost

Linear in K.

Quadratic in K.

Use when

Default. Scales well; good general choice.

Few classes, or when one-vs-rest sub-problems are too imbalanced.

Both strategies run the sub-problems in parallel — pass n_jobs (when the method supports it) to use multiple cores.


13.2.1.3. Making your own quantifier binary#

Decorate a quantifier with binary_quantifier to give it automatic OvR/OvO handling. The decorator reads the strategy from the attribute named by strategy_attr ("strategy" by default) and wraps fit / predict / aggregate with the decomposition logic, exposing the original implementations as _original_fit / _original_predict / _original_aggregate:

import numpy as np
from mlquantify.base import BaseQuantifier
from mlquantify.multiclass import binary_quantifier

@binary_quantifier(strategy_attr="strategy")
class MyBinaryQuantifier(BaseQuantifier):
    def __init__(self, estimator=None, strategy="ovr"):
        self.estimator = estimator
        self.strategy = strategy

    def fit(self, X, y):                 # always sees a *binary* y
        self.classes_ = np.unique(y)
        # ... fit on the binary problem ...
        return self

    def predict(self, X):                # returns a 2-element prevalence
        # ... estimate [neg, pos] ...
        return np.array([0.5, 0.5])

Your fit/predict only ever deal with the binary case; the decorator takes care of the multiclass decomposition and recombination.


13.2.1.4. Adding a new strategy#

Decomposition strategies live in a small registry, so a new one (error-correcting output codes, hierarchical, nested dichotomies, …) is one class plus one decorator — no change to the dispatch. Subclass MulticlassStrategy and register it with register_strategy:

from mlquantify.multiclass import MulticlassStrategy, register_strategy

@register_strategy("ecoc")
class ECOCStrategy(MulticlassStrategy):
    def fit(self, q, X, y, n_jobs=None, fit_args=None, fit_kwargs=None):
        # return {key: fitted_binary_quantifier}
        ...

    def predict(self, q, X, n_jobs=None):
        # return per-class prevalences (dict or array)
        ...

    def aggregate(self, q, classes, args_dict, n_jobs=None):
        # return per-class prevalences from pre-computed predictions
        ...

    def fit_predict(self, q, X, y, X_test, classes, n_jobs=None):
        # fit on (X, y) and return per-class prevalences for X_test
        ...

# now usable on any binary method:
q = DyS(LogisticRegression(), strategy="ecoc")

The four methods return prevalences before the shared normalisation that BinaryQuantifier applies (see Prevalence Normalization). Inspect the available strategies with available_strategies:

from mlquantify.multiclass import available_strategies
available_strategies()
# ['ecoc', 'ovo', 'ovr']

See also

Prevalence Normalization for how the recombined prevalences are normalised, and Building a Quantifier for writing quantifiers.