13.2.2. Prevalence Normalization#

Every quantifier returns its estimate through a single normalization step, so the output is always a valid prevalence vector — non-negative and summing to 1 — in a consistent format. Two settings control this step: the return type (array or dict) and the normalization strategy (how raw estimates are turned into probabilities and how several estimates are aggregated).


13.2.2.1. The normalize_prevalence helper#

normalize_prevalence is the low-level helper that turns a raw vector (or dict) of class scores into a normalized prevalence summing to 1, aligned to a list of classes:

from mlquantify.utils import normalize_prevalence

normalize_prevalence([2.0, 3.0, 5.0], classes=[0, 1, 2])
# {0: 0.2, 1: 0.3, 2: 0.5}

normalize_prevalence({0: 0.1, 1: 0.1, 2: 0.3}, classes=[0, 1, 2])
# {0: 0.2, 1: 0.2, 2: 0.6}

13.2.2.1.1. Parameters#

Parameter

Meaning

prevalences

The raw estimate to normalize: a 1-D array, or a {class: value} dict. Values need not sum to 1 (they are rescaled).

classes

The class labels, used to order the output and to fill in any class missing from a dict input with 0.

Quantifiers rarely call this directly — they go through validate_prevalences, which additionally applies the configurable return type and normalization strategy described below.


13.2.2.2. Configuring normalization#

Two global options drive the final formatting of every prevalence estimate. Read them with get_config, change them with set_config (global) or config_context (temporary, scoped):

13.2.2.2.1. prevalence_return_type — output format#

Value

Behaviour

'array'

Return a numpy.ndarray ordered by class. Global default.

'dict'

Return a {class_label: prevalence} dictionary.

13.2.2.2.2. prevalence_normalization — normalization / aggregation strategy#

Value

Behaviour

'sum' / 'l1'

Rescale so the values sum to 1 (the standard prevalence constraint). Global default.

'softmax'

Apply the softmax function — useful when the raw estimates are logits or unbounded scores rather than proportions.

'mean'

Average several estimates (rows of a 2-D input) into one prevalence — used when many estimates are produced, e.g. by ensembles or bootstrap.

'median'

Take the per-class median across several estimates; more robust to outlier estimates than 'mean'.

None

No normalization or aggregation — return the raw values unchanged.

When the input is a 2-D array of several estimates, 'sum'/'l1' normalize each row and then average them, while 'mean'/'median' aggregate directly; for a single 1-D estimate the aggregation options reduce to that estimate.


13.2.2.3. Examples#

Set a global default for the whole session:

from mlquantify import set_config, get_config

set_config(prevalence_return_type="dict", prevalence_normalization="sum")
get_config()["prevalence_return_type"]
# 'dict'

Change it only temporarily with the context manager (recommended — it restores the previous configuration on exit):

from mlquantify import config_context
from mlquantify.matching import DyS
from sklearn.linear_model import LogisticRegression

q = DyS(LogisticRegression()).fit(X_train, y_train)

# default: array output, sum-normalized
q.predict(X_test)                       # array([0.49, 0.51])

with config_context(prevalence_return_type="dict"):
    q.predict(X_test)                   # {0: 0.49, 1: 0.51}

with config_context(prevalence_normalization="median"):
    # aggregate many estimates by their per-class median
    ...

Note

Per-class median does not in general sum to 1; if you need both robust aggregation and the simplex constraint, aggregate with 'median' and then re-normalize with 'sum'.

See also

Multiclass Quantification for how One-vs-Rest / One-vs-One recombine binary estimates before this normalization is applied.