4.4. Evaluation Metrics#

Evaluation metrics for quantification assess the accuracy of estimated class prevalences against true prevalences. These metrics are crucial for understanding how well a quantifier performs, especially under distributional shifts.

The library includes several widely used evaluation metrics:

Metrics#

Metric

Description

NMD

Normalized Match Distance

RNOD

Relative Normalized Overall Deviation

VSE

Variance Shift Error

CvM_L1

Cramér-von Mises L1 Distance

AE

Absolute Error

SE

Squared Error

MAE

Mean Absolute Error

MSE

Mean Squared Error

KLD

Kullback-Leibler Divergence

RAE

Relative Absolute Error

NAE

Normalized Absolute Error

NRAE

Normalized Relative Absolute Error

NKLD

Normalized Kullback-Leibler Divergence

4.4.1. Single Label Quantification (SLQ) Metrics#

4.4.1.1. AE (Absolute Error)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence (distribution of classes).

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

AE calculates the simple absolute error across classes:

\[\text{AE}(p, \hat{p}) = \sum_{c} |p(c) - \hat{p}(c)|\]

Its primary strength is transparency and ease of interpretation.

4.4.1.2. SE (Squared Error)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

SE is the sum of squared differences:

\[\text{SE}(p, \hat{p}) = \sum_{c} (p(c) - \hat{p}(c))^2\]

This penalizes larger errors more heavily, making outlier mistakes more obvious.

4.4.1.3. MAE (Mean Absolute Error)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

MAE averages the absolute errors over all classes:

\[\text{MAE}(p, \hat{p}) = \frac{1}{K} \sum_{c} |p(c) - \hat{p}(c)|\]

It offers a normalized perspective, useful for comparing performances across datasets.

4.4.1.4. MSE (Mean Squared Error)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

MSE averages the squared errors:

\[\text{MSE}(p, \hat{p}) = \frac{1}{K} \sum_{c} (p(c) - \hat{p}(c))^2\]

Ideal for highlighting large deviations in prevalence estimation.

4.4.1.5. KLD (Kullback-Leibler Divergence)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

KLD measures the information loss between distributions:

\[\text{KLD}(p, \hat{p}) = \sum_{c} p(c) \log \frac{p(c)}{\hat{p}(c)}\]

Its key advantage is sensitivity to wrong predictions where the true prevalence is high.

4.4.1.6. RAE (Relative Absolute Error)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

  • \(\epsilon\): float, optional (default=1e-12) Small constant to ensure numerical stability.

RAE scales the absolute error by true prevalence:

\[\text{RAE}(p, \hat{p}) = \sum_{c} \frac{|p(c) - \hat{p}(c)|}{p(c) + \epsilon}\]

This is beneficial for identifying relative impact in imbalanced scenarios.

4.4.1.7. NAE (Normalized Absolute Error)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

NAE normalizes the absolute error:

\[\text{NAE}(p, \hat{p}) = \frac{1}{K} \sum_{c} \frac{|p(c) - \hat{p}(c)|}{\max\{p(c), \hat{p}(c)\}}\]

Best used for ensuring error scale invariance.

4.4.1.8. NRAE (Normalized Relative Absolute Error)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

  • \(\epsilon\): float, optional (default=1e-12) Small constant for numerical stability.

NRAE further normalizes relative errors:

\[\text{NRAE}(p, \hat{p}) = \frac{1}{K} \sum_{c} \frac{|p(c) - \hat{p}(c)|}{p(c) + \hat{p}(c) + \epsilon}\]

This balances error measurement between true and estimated values.

4.4.1.9. NKLD (Normalized Kullback-Leibler Divergence)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

  • \(\epsilon\): float, optional (default=1e-12) Small constant for numerical stability.

NKLD outputs a normalized form of KLD:

\[\text{NKLD}(p, \hat{p}) = \frac{1}{K} \sum_{c} p(c) \log \frac{p(c)}{\hat{p}(c) + \epsilon}\]

This makes it robust for comparing across distinct sample sizes.

4.4.2. Regression-Based Quantification (RQ) Metrics#

4.4.2.1. VSE (Variance Shift Error)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

The Variance Shift Error quantifies the discrepancy between the variance of true and estimated distributions:

\[\text{VSE}(p, \hat{p}) = |\text{Var}(p) - \text{Var}(\hat{p})|\]

This metric emphasizes changes in dispersion, which is useful for detecting model bias towards certain classes.

4.4.2.2. CvM_L1 (Cramér-von Mises L1 Distance)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

CvM_L1 compares cumulative distributions using the L1 norm:

\[\text{CvM\_L1}(p, \hat{p}) = \sum_{c} |F_p(c) - F_{\hat{p}}(c)|\]

where (F_p(c)) is the cumulative distribution. Its advantage lies in capturing distributional differences beyond pointwise errors.

4.4.3. Ordinal Quantification (OQ) Metrics#

4.4.3.1. NMD (Normalized Match Distance)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

The NMD metric quantifies the normalized difference between two prevalence distributions:

\[\text{NMD}(p, \hat{p}) = \frac{1}{2} \sum_{c} |p(c) - \hat{p}(c)|\]

where ( p(c) ) is the true prevalence and ( hat{p}(c) ) is the estimated. The advantage of NMD is its straightforward interpretability and normalization, making it ideal for comparing different quantification methods.

4.4.3.2. RNOD (Relative Normalized Overall Deviation)#

Parameters:

  • \(p\): array-like, shape (n_classes,) True prevalence.

  • \(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.

  • \(\epsilon\): float, optional (default=1e-12) Small constant to ensure numerical stability.

RNOD measures the proportional deviation between the true and estimated prevalence, particularly highlighting errors in rare classes:

\[\text{RNOD}(p, \hat{p}) = \frac{1}{K} \sum_{c} \frac{|p(c) - \hat{p}(c)|}{p(c) + \epsilon}\]

Its benefit is in handling imbalanced distributions by reducing the influence of dominant classes.