.. _multiclass: .. currentmodule:: mlquantify.multiclass ========================== Multiclass Quantification ========================== Several quantifiers are **binary by design** — they estimate the prevalence of a *positive* class against everything else (e.g. the threshold-adjustment methods :class:`~mlquantify.counting.ACC`/:class:`~mlquantify.counting.TAC`, and the mixture models :class:`~mlquantify.matching.DyS`, :class:`~mlquantify.matching.HDy`, :class:`~mlquantify.matching.SORD`). To apply them to a problem with more than two classes, ``mlquantify`` automatically **decomposes** the task into binary sub-problems, runs the quantifier on each, and **recombines** the results into a single prevalence vector over all classes. This page covers the two built-in decomposition strategies (**One-vs-Rest** and **One-vs-One**), how to choose between them, how to make your own quantifier binary, and how to register a brand-new strategy. .. contents:: Contents :local: :depth: 2 ---- Setting the strategy on a binary method ======================================= Binary methods expose a ``strategy`` parameter. Pass it to the constructor, or set it afterwards as an attribute — both are equivalent: .. code-block:: python from mlquantify.matching import DyS from sklearn.linear_model import LogisticRegression # via the constructor q = DyS(LogisticRegression(), strategy="ovo") # or as an attribute q = DyS(LogisticRegression()) q.strategy = "ovo" q.fit(X_train, y_train) # 4-class data q.predict(X_test) # {0: 0.26, 1: 0.24, 2: 0.25, 3: 0.25} If the data has only two classes the decomposition is skipped entirely and the method runs natively — the ``strategy`` is only consulted for ``> 2`` classes. .. note:: Native-multiclass methods (e.g. :class:`~mlquantify.counting.GACC`, :class:`~mlquantify.matching.KDEyML`, :class:`~mlquantify.matching.EDy`, :class:`~mlquantify.likelihood.EMQ`) do **not** decompose and ignore ``strategy``; they model all classes jointly. ---- One-vs-Rest vs One-vs-One ========================= .. list-table:: :widths: 14 43 43 :header-rows: 1 * - Strategy - One-vs-Rest (``'ovr'``, default) - One-vs-One (``'ovo'``) * - Sub-problems - One per class: *class c* vs the rest (``K`` binary quantifiers). - One per class pair (``K(K-1)/2`` binary quantifiers). * - Recombination - Each binary quantifier gives the prevalence of its class; the vector is normalised to sum to 1. - Each pair gives the prevalence of one class against the other; per-class estimates are averaged over all pairs the class appears in. * - Cost - Linear in ``K``. - Quadratic in ``K``. * - Use when - **Default.** Scales well; good general choice. - Few classes, or when one-vs-rest sub-problems are too imbalanced. Both strategies run the sub-problems in parallel — pass ``n_jobs`` (when the method supports it) to use multiple cores. ---- Making your own quantifier binary ================================= Decorate a quantifier with :func:`binary_quantifier` to give it automatic OvR/OvO handling. The decorator reads the strategy from the attribute named by ``strategy_attr`` (``"strategy"`` by default) and wraps ``fit`` / ``predict`` / ``aggregate`` with the decomposition logic, exposing the original implementations as ``_original_fit`` / ``_original_predict`` / ``_original_aggregate``: .. code-block:: python import numpy as np from mlquantify.base import BaseQuantifier from mlquantify.multiclass import binary_quantifier @binary_quantifier(strategy_attr="strategy") class MyBinaryQuantifier(BaseQuantifier): def __init__(self, estimator=None, strategy="ovr"): self.estimator = estimator self.strategy = strategy def fit(self, X, y): # always sees a *binary* y self.classes_ = np.unique(y) # ... fit on the binary problem ... return self def predict(self, X): # returns a 2-element prevalence # ... estimate [neg, pos] ... return np.array([0.5, 0.5]) Your ``fit``/``predict`` only ever deal with the binary case; the decorator takes care of the multiclass decomposition and recombination. ---- Adding a new strategy ===================== Decomposition strategies live in a small registry, so a new one (error-correcting output codes, hierarchical, nested dichotomies, …) is **one class plus one decorator — no change to the dispatch**. Subclass :class:`MulticlassStrategy` and register it with :func:`register_strategy`: .. code-block:: python from mlquantify.multiclass import MulticlassStrategy, register_strategy @register_strategy("ecoc") class ECOCStrategy(MulticlassStrategy): def fit(self, q, X, y, n_jobs=None, fit_args=None, fit_kwargs=None): # return {key: fitted_binary_quantifier} ... def predict(self, q, X, n_jobs=None): # return per-class prevalences (dict or array) ... def aggregate(self, q, classes, args_dict, n_jobs=None): # return per-class prevalences from pre-computed predictions ... def fit_predict(self, q, X, y, X_test, classes, n_jobs=None): # fit on (X, y) and return per-class prevalences for X_test ... # now usable on any binary method: q = DyS(LogisticRegression(), strategy="ecoc") The four methods return prevalences *before* the shared normalisation that :class:`BinaryQuantifier` applies (see :ref:`prevalence_normalization`). Inspect the available strategies with :func:`available_strategies`: .. code-block:: python from mlquantify.multiclass import available_strategies available_strategies() # ['ecoc', 'ovo', 'ovr'] .. seealso:: :ref:`prevalence_normalization` for how the recombined prevalences are normalised, and :ref:`building_a_quantifier` for writing quantifiers.