4.4. Evaluation Metrics#
Evaluation metrics for quantification assess the accuracy of estimated class prevalences against true prevalences. These metrics are crucial for understanding how well a quantifier performs, especially under distributional shifts.
The library includes several widely used evaluation metrics:
Metric |
Description |
|---|---|
Normalized Match Distance |
|
Relative Normalized Overall Deviation |
|
Variance Shift Error |
|
Cramér-von Mises L1 Distance |
|
Absolute Error |
|
Squared Error |
|
Mean Absolute Error |
|
Mean Squared Error |
|
Kullback-Leibler Divergence |
|
Relative Absolute Error |
|
Normalized Absolute Error |
|
Normalized Relative Absolute Error |
|
Normalized Kullback-Leibler Divergence |
4.4.1. Single Label Quantification (SLQ) Metrics#
4.4.1.1. AE (Absolute Error)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence (distribution of classes).
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
AE calculates the simple absolute error across classes:
Its primary strength is transparency and ease of interpretation.
4.4.1.2. SE (Squared Error)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
SE is the sum of squared differences:
This penalizes larger errors more heavily, making outlier mistakes more obvious.
4.4.1.3. MAE (Mean Absolute Error)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
MAE averages the absolute errors over all classes:
It offers a normalized perspective, useful for comparing performances across datasets.
4.4.1.4. MSE (Mean Squared Error)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
MSE averages the squared errors:
Ideal for highlighting large deviations in prevalence estimation.
4.4.1.5. KLD (Kullback-Leibler Divergence)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
KLD measures the information loss between distributions:
Its key advantage is sensitivity to wrong predictions where the true prevalence is high.
4.4.1.6. RAE (Relative Absolute Error)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
\(\epsilon\): float, optional (default=1e-12) Small constant to ensure numerical stability.
RAE scales the absolute error by true prevalence:
This is beneficial for identifying relative impact in imbalanced scenarios.
4.4.1.7. NAE (Normalized Absolute Error)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
NAE normalizes the absolute error:
Best used for ensuring error scale invariance.
4.4.1.8. NRAE (Normalized Relative Absolute Error)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
\(\epsilon\): float, optional (default=1e-12) Small constant for numerical stability.
NRAE further normalizes relative errors:
This balances error measurement between true and estimated values.
4.4.1.9. NKLD (Normalized Kullback-Leibler Divergence)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
\(\epsilon\): float, optional (default=1e-12) Small constant for numerical stability.
NKLD outputs a normalized form of KLD:
This makes it robust for comparing across distinct sample sizes.
4.4.2. Regression-Based Quantification (RQ) Metrics#
4.4.2.1. VSE (Variance Shift Error)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
The Variance Shift Error quantifies the discrepancy between the variance of true and estimated distributions:
This metric emphasizes changes in dispersion, which is useful for detecting model bias towards certain classes.
4.4.2.2. CvM_L1 (Cramér-von Mises L1 Distance)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
CvM_L1 compares cumulative distributions using the L1 norm:
where (F_p(c)) is the cumulative distribution. Its advantage lies in capturing distributional differences beyond pointwise errors.
4.4.3. Ordinal Quantification (OQ) Metrics#
4.4.3.1. NMD (Normalized Match Distance)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
The NMD metric quantifies the normalized difference between two prevalence distributions:
where ( p(c) ) is the true prevalence and ( hat{p}(c) ) is the estimated. The advantage of NMD is its straightforward interpretability and normalization, making it ideal for comparing different quantification methods.
4.4.3.2. RNOD (Relative Normalized Overall Deviation)#
Parameters:
\(p\): array-like, shape (n_classes,) True prevalence.
\(\hat{p}\): array-like, shape (n_classes,) Estimated prevalence.
\(\epsilon\): float, optional (default=1e-12) Small constant to ensure numerical stability.
RNOD measures the proportional deviation between the true and estimated prevalence, particularly highlighting errors in rare classes:
Its benefit is in handling imbalanced distributions by reducing the influence of dominant classes.