13.2.2. Prevalence Normalization#

Every quantifier returns its estimate through a single normalization step, so the output is always a valid prevalence vector — non-negative and summing to 1 — in a consistent format. Two settings control this step: the return type (array or dict) and the normalization strategy (how raw estimates are turned into probabilities and how several estimates are aggregated).

13.2.2.1. The `normalize_prevalence` helper #

normalize_prevalence is the low-level helper that turns a raw vector (or dict) of class scores into a normalized prevalence summing to 1, aligned to a list of classes:

from mlquantify.utils import normalize_prevalence

normalize_prevalence([2.0, 3.0, 5.0], classes=[0, 1, 2])
# {0: 0.2, 1: 0.3, 2: 0.5}

normalize_prevalence({0: 0.1, 1: 0.1, 2: 0.3}, classes=[0, 1, 2])
# {0: 0.2, 1: 0.2, 2: 0.6}

13.2.2.1.1. Parameters #

Parameter	Meaning
`prevalences`	The raw estimate to normalize: a 1-D array, or a `{class: value}` dict. Values need not sum to 1 (they are rescaled).
`classes`	The class labels, used to order the output and to fill in any class missing from a dict input with `0`.

Quantifiers rarely call this directly — they go through validate_prevalences, which additionally applies the configurable return type and normalization strategy described below.

13.2.2.2. Configuring normalization #

Two global options drive the final formatting of every prevalence estimate. Read them with get_config, change them with set_config (global) or config_context (temporary, scoped):

13.2.2.2.1. `prevalence_return_type` — output format #

Value	Behaviour
`'array'`	Return a `numpy.ndarray` ordered by class. Global default.
`'dict'`	Return a `{class_label: prevalence}` dictionary.

13.2.2.2.2. `prevalence_normalization` — normalization / aggregation strategy #

Value	Behaviour
`'sum'` / `'l1'`	Rescale so the values sum to 1 (the standard prevalence constraint). Global default.
`'softmax'`	Apply the softmax function — useful when the raw estimates are logits or unbounded scores rather than proportions.
`'mean'`	Average several estimates (rows of a 2-D input) into one prevalence — used when many estimates are produced, e.g. by ensembles or bootstrap.
`'median'`	Take the per-class median across several estimates; more robust to outlier estimates than `'mean'`.
`None`	No normalization or aggregation — return the raw values unchanged.

When the input is a 2-D array of several estimates, 'sum'/'l1' normalize each row and then average them, while 'mean'/'median' aggregate directly; for a single 1-D estimate the aggregation options reduce to that estimate.

13.2.2.3. Examples #

Set a global default for the whole session:

from mlquantify import set_config, get_config

set_config(prevalence_return_type="dict", prevalence_normalization="sum")
get_config()["prevalence_return_type"]
# 'dict'

Change it only temporarily with the context manager (recommended — it restores the previous configuration on exit):

from mlquantify import config_context
from mlquantify.matching import DyS
from sklearn.linear_model import LogisticRegression

q = DyS(LogisticRegression()).fit(X_train, y_train)

# default: array output, sum-normalized
q.predict(X_test)                       # array([0.49, 0.51])

with config_context(prevalence_return_type="dict"):
    q.predict(X_test)                   # {0: 0.49, 1: 0.51}

with config_context(prevalence_normalization="median"):
    # aggregate many estimates by their per-class median
    ...

Note

Per-class median does not in general sum to 1; if you need both robust aggregation and the simplex constraint, aggregate with 'median' and then re-normalize with 'sum'.

13.2.2. Prevalence Normalization#

13.2.2.1. The normalize_prevalence helper#

13.2.2.1.1. Parameters#

13.2.2.2. Configuring normalization#

13.2.2.2.1. prevalence_return_type — output format#

13.2.2.2.2. prevalence_normalization — normalization / aggregation strategy#

13.2.2.3. Examples#