13.2.1. Multiclass Quantification#
Several quantifiers are binary by design — they estimate the prevalence of a
positive class against everything else (e.g. the threshold-adjustment methods
ACC/TAC, and the
mixture models DyS, HDy,
SORD). To apply them to a problem with more than two
classes, mlquantify automatically decomposes the task into binary
sub-problems, runs the quantifier on each, and recombines the results into a
single prevalence vector over all classes.
This page covers the two built-in decomposition strategies (One-vs-Rest and One-vs-One), how to choose between them, how to make your own quantifier binary, and how to register a brand-new strategy.
13.2.1.1. Setting the strategy on a binary method#
Binary methods expose a strategy parameter. Pass it to the constructor, or
set it afterwards as an attribute — both are equivalent:
from mlquantify.matching import DyS
from sklearn.linear_model import LogisticRegression
# via the constructor
q = DyS(LogisticRegression(), strategy="ovo")
# or as an attribute
q = DyS(LogisticRegression())
q.strategy = "ovo"
q.fit(X_train, y_train) # 4-class data
q.predict(X_test)
# {0: 0.26, 1: 0.24, 2: 0.25, 3: 0.25}
If the data has only two classes the decomposition is skipped entirely and the
method runs natively — the strategy is only consulted for > 2 classes.
13.2.1.2. One-vs-Rest vs One-vs-One#
Strategy |
One-vs-Rest ( |
One-vs-One ( |
|---|---|---|
Sub-problems |
One per class: class c vs the rest ( |
One per class pair ( |
Recombination |
Each binary quantifier gives the prevalence of its class; the vector is normalised to sum to 1. |
Each pair gives the prevalence of one class against the other; per-class estimates are averaged over all pairs the class appears in. |
Cost |
Linear in |
Quadratic in |
Use when |
Default. Scales well; good general choice. |
Few classes, or when one-vs-rest sub-problems are too imbalanced. |
Both strategies run the sub-problems in parallel — pass n_jobs (when the
method supports it) to use multiple cores.
13.2.1.3. Making your own quantifier binary#
Decorate a quantifier with binary_quantifier to give it automatic
OvR/OvO handling. The decorator reads the strategy from the attribute named by
strategy_attr ("strategy" by default) and wraps fit / predict /
aggregate with the decomposition logic, exposing the original
implementations as _original_fit / _original_predict / _original_aggregate:
import numpy as np
from mlquantify.base import BaseQuantifier
from mlquantify.multiclass import binary_quantifier
@binary_quantifier(strategy_attr="strategy")
class MyBinaryQuantifier(BaseQuantifier):
def __init__(self, estimator=None, strategy="ovr"):
self.estimator = estimator
self.strategy = strategy
def fit(self, X, y): # always sees a *binary* y
self.classes_ = np.unique(y)
# ... fit on the binary problem ...
return self
def predict(self, X): # returns a 2-element prevalence
# ... estimate [neg, pos] ...
return np.array([0.5, 0.5])
Your fit/predict only ever deal with the binary case; the decorator takes
care of the multiclass decomposition and recombination.
13.2.1.4. Adding a new strategy#
Decomposition strategies live in a small registry, so a new one (error-correcting
output codes, hierarchical, nested dichotomies, …) is one class plus one
decorator — no change to the dispatch. Subclass MulticlassStrategy and
register it with register_strategy:
from mlquantify.multiclass import MulticlassStrategy, register_strategy
@register_strategy("ecoc")
class ECOCStrategy(MulticlassStrategy):
def fit(self, q, X, y, n_jobs=None, fit_args=None, fit_kwargs=None):
# return {key: fitted_binary_quantifier}
...
def predict(self, q, X, n_jobs=None):
# return per-class prevalences (dict or array)
...
def aggregate(self, q, classes, args_dict, n_jobs=None):
# return per-class prevalences from pre-computed predictions
...
def fit_predict(self, q, X, y, X_test, classes, n_jobs=None):
# fit on (X, y) and return per-class prevalences for X_test
...
# now usable on any binary method:
q = DyS(LogisticRegression(), strategy="ecoc")
The four methods return prevalences before the shared normalisation that
BinaryQuantifier applies (see Prevalence Normalization).
Inspect the available strategies with available_strategies:
from mlquantify.multiclass import available_strategies
available_strategies()
# ['ecoc', 'ovo', 'ovr']
See also
Prevalence Normalization for how the recombined prevalences are normalised, and Building a Quantifier for writing quantifiers.