13.2.2. Prevalence Normalization#
Every quantifier returns its estimate through a single normalization step, so the output is always a valid prevalence vector — non-negative and summing to 1 — in a consistent format. Two settings control this step: the return type (array or dict) and the normalization strategy (how raw estimates are turned into probabilities and how several estimates are aggregated).
13.2.2.1. The normalize_prevalence helper#
normalize_prevalence is the low-level helper that turns
a raw vector (or dict) of class scores into a normalized prevalence summing to 1,
aligned to a list of classes:
from mlquantify.utils import normalize_prevalence
normalize_prevalence([2.0, 3.0, 5.0], classes=[0, 1, 2])
# {0: 0.2, 1: 0.3, 2: 0.5}
normalize_prevalence({0: 0.1, 1: 0.1, 2: 0.3}, classes=[0, 1, 2])
# {0: 0.2, 1: 0.2, 2: 0.6}
13.2.2.1.1. Parameters#
Parameter |
Meaning |
|---|---|
|
The raw estimate to normalize: a 1-D array, or a |
|
The class labels, used to order the output and to fill in any class
missing from a dict input with |
Quantifiers rarely call this directly — they go through validate_prevalences,
which additionally applies the configurable return type and normalization
strategy described below.
13.2.2.2. Configuring normalization#
Two global options drive the final formatting of every prevalence estimate.
Read them with get_config, change them with set_config (global)
or config_context (temporary, scoped):
13.2.2.2.1. prevalence_return_type — output format#
Value |
Behaviour |
|---|---|
|
Return a |
|
Return a |
13.2.2.2.2. prevalence_normalization — normalization / aggregation strategy#
Value |
Behaviour |
|---|---|
|
Rescale so the values sum to 1 (the standard prevalence constraint). Global default. |
|
Apply the softmax function — useful when the raw estimates are logits or unbounded scores rather than proportions. |
|
Average several estimates (rows of a 2-D input) into one prevalence — used when many estimates are produced, e.g. by ensembles or bootstrap. |
|
Take the per-class median across several estimates; more robust to outlier
estimates than |
|
No normalization or aggregation — return the raw values unchanged. |
When the input is a 2-D array of several estimates, 'sum'/'l1' normalize
each row and then average them, while 'mean'/'median' aggregate directly;
for a single 1-D estimate the aggregation options reduce to that estimate.
13.2.2.3. Examples#
Set a global default for the whole session:
from mlquantify import set_config, get_config
set_config(prevalence_return_type="dict", prevalence_normalization="sum")
get_config()["prevalence_return_type"]
# 'dict'
Change it only temporarily with the context manager (recommended — it restores the previous configuration on exit):
from mlquantify import config_context
from mlquantify.matching import DyS
from sklearn.linear_model import LogisticRegression
q = DyS(LogisticRegression()).fit(X_train, y_train)
# default: array output, sum-normalized
q.predict(X_test) # array([0.49, 0.51])
with config_context(prevalence_return_type="dict"):
q.predict(X_test) # {0: 0.49, 1: 0.51}
with config_context(prevalence_normalization="median"):
# aggregate many estimates by their per-class median
...
Note
Per-class median does not in general sum to 1; if you need both robust
aggregation and the simplex constraint, aggregate with 'median' and then
re-normalize with 'sum'.
See also
Multiclass Quantification for how One-vs-Rest / One-vs-One recombine binary estimates before this normalization is applied.