.. _confidence_intervals_quantification: Confidence Regions for Quantification ===================================== This guide details the main types of confidence regions for prevalence estimates in quantification, as implemented in :mod:`mlquantify.confidence`. It covers the principles, mathematical definitions, attributes, and usage examples for each region type. For advanced bootstrap-based strategies (e.g., model-based and population-based), see :mod:`mlquantify.meta.AggregativeBootstrap`, which provides detailed implementations. =============== General Concept =============== A confidence region for quantification is a subset in the prevalence space that, with probability :math:`1-\alpha`, contains the unknown true class prevalence vector :math:`\pi^*` of the test set. The width and shape of this region express uncertainty around the point estimate. Typical regions are defined as :math:`CR_\alpha` such that .. math:: \mathbb{P}\left(\pi^{\ast} \in CR_{\alpha}\right) = 1 - \alpha where :math:`\pi^{\ast}` is the true prevalence vector. Confidence region types differ by how they model joint uncertainty across class prevalences: - **Confidence intervals** (by percentiles) - **Confidence ellipse in simplex** (multivariate) - **Confidence ellipse in CLR space** (geometry-aware) All regions are constructed from :math:`m` bootstrap resamples of prevalences for :math:`n` classes: :math:`X \in \mathbb{R}^{m \times n}`. API Reference: :class:`BaseConfidenceRegion` (in :mod:`mlquantify.confidence`). .. _confidence_intervals: ===================================== Percentile-Based Confidence Intervals ===================================== Implements independent confidence intervals for each class based on percentiles (nonparametric, assumes independence). :class:`ConfidenceInterval` .. figure:: ../images/ConfidenceInterval_class_1.png :align: center :width: 80% :alt: Confidence Intervals Illustration *Illustration of Confidence Intervals for the positive class* - **Definition:** For a desired confidence :math:`1-\alpha`, compute interval bounds :math:`[L_i, U_i]` for each class :math:`i` from the empirical :math:`\alpha/2` and :math:`1-\alpha/2` percentiles. - **Mathematical region:** .. math:: CI_\alpha(\pi) = \begin{cases} 1 & \text{if } L_i \leq \pi_i \leq U_i, \forall i=1,...,n \\ 0 & \text{otherwise} \end{cases} - **Limitations:** Assumes class independence; region is a hyper-rectangle which can fall outside the probability simplex. **Example:** .. code-block:: python from mlquantify.confidence import ConfidenceInterval import numpy as np X = np.random.dirichlet(np.ones(3), size=200) ci = ConfidenceInterval(X, confidence_level=0.9) print(ci.get_region()) # returns (I_low, I_high) print(ci.contains([0.3, 0.4, 0.3])) # array([[True]]) .. _confidence_ellipse_simplex: ============================= Confidence Ellipse in Simplex ============================= Constructs a multivariate confidence ellipse in the simplex space, enforcing joint uncertainty and correlations. :class:`ConfidenceEllipseSimplex` .. figure:: ../images/ConfidenceEllipseSimplex_class_1.png :align: center :width: 80% :alt: Confidence Ellipse in Simplex illustration *Illustration of Confidence Ellipse in Simplex space for the positive class* - **Definition:** Derives an ellipse around the mean prevalence vector, with axes scaled by covariance, thresholded via chi-squared statistic. - **Mathematical region:** .. math:: CE_\alpha(\pi) = \begin{cases} 1 & \text{if } (\pi-\mu)^T\Sigma^{-1}(\pi-\mu) \leq \chi^2_{n-1}(1-\alpha) \\ 0 & \text{otherwise} \end{cases} - **Attributes:** - :math:`\mu`: sample mean of bootstrap prevalences - :math:`\Sigma^{-1}`: inverse covariance matrix - :math:`\chi^2_{n-1}(1-\alpha)`: chi-squared cutoff - **Limitations:** Assumes normality of prevalence estimates (may not hold); region may partially extend beyond the simplex. **Example:** .. code-block:: python from mlquantify.confidence import ConfidenceEllipseSimplex import numpy as np X = np.random.dirichlet(np.ones(3), size=200) ce = ConfidenceEllipseSimplex(X, confidence_level=0.95) print(ce.get_point_estimate()) print(ce.contains(np.array([0.4, 0.3, 0.3]))) .. _confidence_ellipse_clr: =============================== Confidence Ellipse in CLR Space =============================== Models the geometry of the simplex by transforming prevalence vectors using the Centered Log-Ratio (CLR) transformation before constructing the ellipse. :class:`ConfidenceEllipseCLR` .. figure:: ../images/ConfidenceEllipseCLR_class_1.png :align: center :width: 80% :alt: Confidence Ellipse in CLR illustration *Illustration of Confidence Ellipse in CLR space for the positive class* - **CLR transformation:** :math:`T: \Delta^{n-1} \rightarrow \mathbb{R}^n`, with .. math:: T(\pi) = [\log(\pi_1 / g(\pi)), \ldots, \log(\pi_n / g(\pi))], \quad g(\pi) = (\prod_i \pi_i)^{1/n} - **Region definition:** .. math:: CT_\alpha(\pi) = \begin{cases} 1 & \text{if } (T(\pi) - \mu_{CLR})^T \Sigma^{-1} (T(\pi) - \mu_{CLR}) \leq \chi^2_{n-1}(1-\alpha) \\ 0 & \text{otherwise} \end{cases} - **Attributes:** As above, but all computations done in the transformed CLR space. - **Advantages:** Adapts to the compositional nature of prevalence estimates, keeping the region well-behaved within the simplex. **Example:** .. code-block:: python from mlquantify.confidence import ConfidenceEllipseCLR import numpy as np X = np.random.dirichlet(np.ones(3), size=200) clr = ConfidenceEllipseCLR(X, confidence_level=0.9) print(clr.get_point_estimate()) print(clr.contains(np.array([0.4, 0.4, 0.2]))) References ---------- - Moreo, A., & Salvati, N. (2025). *An Efficient Method for Deriving Confidence Intervals in Aggregative Quantification*. Istituto di Scienza e Tecnologie dell'Informazione, CNR, Pisa. - See also: Section 3.3 and Equations (1)-(3) in the reference above.