.. _prevalence_normalization: .. currentmodule:: mlquantify ======================== Prevalence Normalization ======================== Every quantifier returns its estimate through a single normalization step, so the output is always a valid prevalence vector — non-negative and summing to 1 — in a consistent format. Two settings control this step: the **return type** (array or dict) and the **normalization strategy** (how raw estimates are turned into probabilities and how several estimates are aggregated). .. contents:: Contents :local: :depth: 2 ---- The ``normalize_prevalence`` helper =================================== :func:`~mlquantify.utils.normalize_prevalence` is the low-level helper that turns a raw vector (or dict) of class scores into a normalized prevalence summing to 1, aligned to a list of classes: .. code-block:: python from mlquantify.utils import normalize_prevalence normalize_prevalence([2.0, 3.0, 5.0], classes=[0, 1, 2]) # {0: 0.2, 1: 0.3, 2: 0.5} normalize_prevalence({0: 0.1, 1: 0.1, 2: 0.3}, classes=[0, 1, 2]) # {0: 0.2, 1: 0.2, 2: 0.6} Parameters ---------- .. list-table:: :widths: 22 78 :header-rows: 1 * - Parameter - Meaning * - ``prevalences`` - The raw estimate to normalize: a 1-D array, or a ``{class: value}`` dict. Values need not sum to 1 (they are rescaled). * - ``classes`` - The class labels, used to order the output and to fill in any class missing from a dict input with ``0``. Quantifiers rarely call this directly — they go through ``validate_prevalences``, which additionally applies the **configurable** return type and normalization strategy described below. ---- Configuring normalization ========================= Two global options drive the final formatting of every prevalence estimate. Read them with :func:`get_config`, change them with :func:`set_config` (global) or :func:`config_context` (temporary, scoped): ``prevalence_return_type`` — output format ------------------------------------------ .. list-table:: :widths: 16 84 :header-rows: 1 * - Value - Behaviour * - ``'array'`` - Return a :class:`numpy.ndarray` ordered by class. **Global default.** * - ``'dict'`` - Return a ``{class_label: prevalence}`` dictionary. ``prevalence_normalization`` — normalization / aggregation strategy ------------------------------------------------------------------- .. list-table:: :widths: 18 82 :header-rows: 1 * - Value - Behaviour * - ``'sum'`` / ``'l1'`` - Rescale so the values sum to 1 (the standard prevalence constraint). **Global default.** * - ``'softmax'`` - Apply the softmax function — useful when the raw estimates are logits or unbounded scores rather than proportions. * - ``'mean'`` - Average several estimates (rows of a 2-D input) into one prevalence — used when many estimates are produced, e.g. by ensembles or bootstrap. * - ``'median'`` - Take the per-class median across several estimates; more robust to outlier estimates than ``'mean'``. * - ``None`` - No normalization or aggregation — return the raw values unchanged. When the input is a 2-D array of several estimates, ``'sum'``/``'l1'`` normalize each row and then average them, while ``'mean'``/``'median'`` aggregate directly; for a single 1-D estimate the aggregation options reduce to that estimate. ---- Examples ======== Set a global default for the whole session: .. code-block:: python from mlquantify import set_config, get_config set_config(prevalence_return_type="dict", prevalence_normalization="sum") get_config()["prevalence_return_type"] # 'dict' Change it only temporarily with the context manager (recommended — it restores the previous configuration on exit): .. code-block:: python from mlquantify import config_context from mlquantify.matching import DyS from sklearn.linear_model import LogisticRegression q = DyS(LogisticRegression()).fit(X_train, y_train) # default: array output, sum-normalized q.predict(X_test) # array([0.49, 0.51]) with config_context(prevalence_return_type="dict"): q.predict(X_test) # {0: 0.49, 1: 0.51} with config_context(prevalence_normalization="median"): # aggregate many estimates by their per-class median ... .. note:: Per-class **median** does not in general sum to 1; if you need both robust aggregation *and* the simplex constraint, aggregate with ``'median'`` and then re-normalize with ``'sum'``. .. seealso:: :ref:`multiclass` for how One-vs-Rest / One-vs-One recombine binary estimates before this normalization is applied.