.. _evaluation_metrics: Evaluation Metrics ------------------ .. currentmodule:: mlquantify.metrics Evaluation metrics for quantification assess the accuracy of estimated class prevalences against true prevalences. These metrics are crucial for understanding how well a quantifier performs, especially under distributional shifts. The library includes several widely used evaluation metrics: .. list-table:: Metrics :header-rows: 1 :widths: 30 70 * - Metric - Description * - :class:`NMD` - Normalized Match Distance * - :class:`RNOD` - Relative Normalized Overall Deviation * - :class:`VSE` - Variance Shift Error * - :class:`CvM_L1` - Cramér-von Mises L1 Distance * - :class:`AE` - Absolute Error * - :class:`SE` - Squared Error * - :class:`MAE` - Mean Absolute Error * - :class:`MSE` - Mean Squared Error * - :class:`KLD` - Kullback-Leibler Divergence * - :class:`RAE` - Relative Absolute Error * - :class:`NAE` - Normalized Absolute Error * - :class:`NRAE` - Normalized Relative Absolute Error * - :class:`NKLD` - Normalized Kullback-Leibler Divergence ========================================= Single Label Quantification (SLQ) Metrics ========================================= AE (Absolute Error) =================== **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence (distribution of classes). - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. AE calculates the simple absolute error across classes: .. math:: \text{AE}(p, \hat{p}) = \sum_{c} |p(c) - \hat{p}(c)| Its primary strength is transparency and ease of interpretation. SE (Squared Error) ================== **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. SE is the sum of squared differences: .. math:: \text{SE}(p, \hat{p}) = \sum_{c} (p(c) - \hat{p}(c))^2 This penalizes larger errors more heavily, making outlier mistakes more obvious. MAE (Mean Absolute Error) ========================= **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. MAE averages the absolute errors over all classes: .. math:: \text{MAE}(p, \hat{p}) = \frac{1}{K} \sum_{c} |p(c) - \hat{p}(c)| It offers a normalized perspective, useful for comparing performances across datasets. MSE (Mean Squared Error) ======================== **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. MSE averages the squared errors: .. math:: \text{MSE}(p, \hat{p}) = \frac{1}{K} \sum_{c} (p(c) - \hat{p}(c))^2 Ideal for highlighting large deviations in prevalence estimation. KLD (Kullback-Leibler Divergence) ================================= **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. KLD measures the information loss between distributions: .. math:: \text{KLD}(p, \hat{p}) = \sum_{c} p(c) \log \frac{p(c)}{\hat{p}(c)} Its key advantage is sensitivity to wrong predictions where the true prevalence is high. RAE (Relative Absolute Error) ============================= **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. - :math:`\epsilon`: float, optional (default=1e-12) Small constant to ensure numerical stability. RAE scales the absolute error by true prevalence: .. math:: \text{RAE}(p, \hat{p}) = \sum_{c} \frac{|p(c) - \hat{p}(c)|}{p(c) + \epsilon} This is beneficial for identifying relative impact in imbalanced scenarios. NAE (Normalized Absolute Error) =============================== **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. NAE normalizes the absolute error: .. math:: \text{NAE}(p, \hat{p}) = \frac{1}{K} \sum_{c} \frac{|p(c) - \hat{p}(c)|}{\max\{p(c), \hat{p}(c)\}} Best used for ensuring error scale invariance. NRAE (Normalized Relative Absolute Error) ========================================= **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. - :math:`\epsilon`: float, optional (default=1e-12) Small constant for numerical stability. NRAE further normalizes relative errors: .. math:: \text{NRAE}(p, \hat{p}) = \frac{1}{K} \sum_{c} \frac{|p(c) - \hat{p}(c)|}{p(c) + \hat{p}(c) + \epsilon} This balances error measurement between true and estimated values. NKLD (Normalized Kullback-Leibler Divergence) ============================================= **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. - :math:`\epsilon`: float, optional (default=1e-12) Small constant for numerical stability. NKLD outputs a normalized form of KLD: .. math:: \text{NKLD}(p, \hat{p}) = \frac{1}{K} \sum_{c} p(c) \log \frac{p(c)}{\hat{p}(c) + \epsilon} This makes it robust for comparing across distinct sample sizes. ============================================ Regression-Based Quantification (RQ) Metrics ============================================ VSE (Variance Shift Error) ========================== **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. The Variance Shift Error quantifies the discrepancy between the variance of true and estimated distributions: .. math:: \text{VSE}(p, \hat{p}) = |\text{Var}(p) - \text{Var}(\hat{p})| This metric emphasizes changes in dispersion, which is useful for detecting model bias towards certain classes. CvM_L1 (Cramér-von Mises L1 Distance) ===================================== **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. CvM_L1 compares cumulative distributions using the L1 norm: .. math:: \text{CvM\_L1}(p, \hat{p}) = \sum_{c} |F_p(c) - F_{\hat{p}}(c)| where \(F_p(c)\) is the cumulative distribution. Its advantage lies in capturing distributional differences beyond pointwise errors. =================================== Ordinal Quantification (OQ) Metrics =================================== NMD (Normalized Match Distance) =============================== **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. The NMD metric quantifies the normalized difference between two prevalence distributions: .. math:: \text{NMD}(p, \hat{p}) = \frac{1}{2} \sum_{c} |p(c) - \hat{p}(c)| where \( p(c) \) is the true prevalence and \( \hat{p}(c) \) is the estimated. The advantage of NMD is its straightforward interpretability and normalization, making it ideal for comparing different quantification methods. RNOD (Relative Normalized Overall Deviation) ============================================ **Parameters:** - :math:`p`: array-like, shape (n_classes,) True prevalence. - :math:`\hat{p}`: array-like, shape (n_classes,) Estimated prevalence. - :math:`\epsilon`: float, optional (default=1e-12) Small constant to ensure numerical stability. RNOD measures the proportional deviation between the true and estimated prevalence, particularly highlighting errors in rare classes: .. math:: \text{RNOD}(p, \hat{p}) = \frac{1}{K} \sum_{c} \frac{|p(c) - \hat{p}(c)|}{p(c) + \epsilon} Its benefit is in handling imbalanced distributions by reducing the influence of dominant classes.