.. _ensemble: .. currentmodule:: mlquantify.meta =========================== Meta-Quantification Methods =========================== Meta-quantifiers wrap an existing *base quantifier* and add higher-level strategies — ensembling, adaptive score correction, or bootstrap confidence estimation — to improve accuracy or reliability. .. contents:: Contents :local: :depth: 2 ---- EnsembleQ — Ensemble of Quantifiers ===================================== :class:`EnsembleQ` (Pérez-Gállego et al., 2017, 2019) creates a diverse ensemble of base quantifiers, each trained on a subsample with a **different class prevalence**. Diversity in training prevalences makes the ensemble robust to test conditions not seen by any single model. **Three phases:** 1. **Sample generation** — draw :math:`K` training batches with prevalences sampled from a chosen protocol (uniform, artificial, natural). 2. **Training** — fit an independent copy of the base quantifier on each batch. 3. **Aggregation** — average (or take the median of) all members' predictions, optionally keeping only the most relevant members. **Why it excels:** A single quantifier may be over-tuned to the training prevalence. The ensemble explores the full prevalence space during training and aggregates across diverse operating points, reducing both bias and variance of the final estimate. Parameters ---------- .. list-table:: :widths: 22 15 63 :header-rows: 1 * - Parameter - Default - Explanation * - ``quantifier`` - required - The base quantifier. Any ``BaseQuantifier`` subclass works. Use a reasonably fast method (e.g. :class:`~mlquantify.matching.DyS`) because ``size`` copies will be trained. * - ``size`` - ``50`` - Number of ensemble members. More members → more diversity and smoother estimates, but linearly more training time. 20–50 is a good range. * - ``min_prop`` - ``0.1`` - Minimum class proportion for sampling batches. Set to ``0.0`` to allow nearly all-negative or all-positive batches (risky on small datasets). * - ``max_prop`` - ``1.0`` - Maximum class proportion. * - ``selection_metric`` - ``'all'`` - Which members to include in the final aggregation: - ``'all'`` — use every member equally. Safe default. - ``'ptr'`` — keep the top ``p_metric`` fraction whose *training* prevalence is closest to an initial estimate of the test prevalence. Reduces bias when test prevalences cluster in a specific range. - ``'ds'`` — keep members whose *training score distribution* is closest to the test score distribution (Hellinger distance). Most adaptive but requires a probabilistic base quantifier and an extra logistic regression fit. Binary only. * - ``p_metric`` - ``0.25`` - Fraction of members retained when ``selection_metric`` is ``'ptr'`` or ``'ds'``. ``0.25`` keeps the top 25%. * - ``protocol`` - ``'uniform'`` - Sampling protocol for generating training prevalences: - ``'uniform'`` — sample uniformly from the simplex. Good general choice. - ``'artificial'`` — regular grid (like APP). Gives systematic coverage. - ``'natural'`` — random sub-samples (like NPP). More realistic. - ``'kraemer'`` — like uniform but with a fixed step grid. * - ``return_type`` - ``'mean'`` - Aggregation function across selected members. ``'mean'`` reduces variance; ``'median'`` is more robust to outlier members. * - ``max_sample_size`` - ``None`` - Maximum training-batch size. ``None`` uses the full training set. Set to a smaller value to speed up training on large datasets. * - ``n_jobs`` - ``1`` - Parallel training of ensemble members. ``-1`` uses all CPU cores. Highly recommended for ``size`` > 20. * - ``verbose`` - ``False`` - Print progress during fit and predict. Examples -------- Basic ensemble: .. code-block:: python from mlquantify.meta import EnsembleQ from mlquantify.matching import DyS from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split X, y = make_classification(n_samples=1000, weights=[0.8, 0.2], random_state=42) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42) q = EnsembleQ( quantifier=DyS(LogisticRegression()), size=30, protocol='uniform', n_jobs=-1, ) q.fit(X_train, y_train) print(q.predict(X_test)) Using PTR selection to adapt to test prevalence: .. code-block:: python q = EnsembleQ( quantifier=DyS(LogisticRegression()), size=50, selection_metric='ptr', # keep members closest to test prevalence p_metric=0.25, # keep top 25% return_type='median', n_jobs=-1, ) q.fit(X_train, y_train) print(q.predict(X_test)) .. note:: ``selection_metric='ds'`` requires a probabilistic base quantifier and is binary-only. It fits an internal logistic regression to compute posterior histograms for the distribution similarity check. ---- QuaDapt — Adaptive Score Simulation ===================================== :class:`QuaDapt` (Maletzke et al., 2021) improves prevalence estimation by simulating a synthetic training-score distribution — via the **MoSS** (Model for Score Simulation) — that best matches the observed test-score distribution. The best-matching synthetic set is then used as the training reference for the wrapped quantifier's :meth:`aggregate` call. **Why it exists:** Histogram and density matching methods rely on training scores that may come from a very different score distribution than the test set (due to score variability — the classifier's output range or sharpness changes at test time). QuaDapt adaptively selects a synthetic distribution that bridges this gap, achieving state-of-the-art results on tasks with high score variability. **Binary-only** (OvR for multiclass). Parameters ---------- .. list-table:: :widths: 22 15 63 :header-rows: 1 * - Parameter - Default - Explanation * - ``quantifier`` - required - A soft (probabilistic) base aggregative quantifier (e.g. :class:`~mlquantify.matching.DyS`, :class:`~mlquantify.matching.HDy`). Must support ``aggregate(test_scores, train_scores, labels)``. * - ``measure`` - ``'topsoe'`` - Distance metric for comparing test and synthetic distributions. Options: ``'hellinger'``, ``'topsoe'``, ``'probsymm'``, ``'sord'``. TopSoe is recommended for histogram matching. * - ``merging_factors`` - ``np.arange(0.1, 1.0, 0.2)`` - Candidate merging-factor values for MoSS. The merging factor controls how much positive and negative scores overlap in the synthetic set. A finer grid (e.g. ``np.arange(0.05, 1.0, 0.05)``) gives better results at the cost of more computation. * - ``strategy`` - ``'ovr'`` - Multiclass decomposition. Examples -------- .. code-block:: python from mlquantify.meta import QuaDapt from mlquantify.matching import DyS from sklearn.linear_model import LogisticRegression q = QuaDapt( quantifier=DyS(LogisticRegression()), measure='topsoe', merging_factors=[0.1, 0.3, 0.5, 0.7, 0.9], ) q.fit(X_train, y_train) print(q.predict(X_test)) ---- AggregativeBootstrap — Confidence Intervals via Bootstrap =========================================================== :class:`AggregativeBootstrap` wraps any aggregative quantifier and applies **bootstrap resampling** to both training and test predictions, generating a *distribution* of prevalence estimates. The distribution is summarised as a point estimate together with a confidence region. **Why it exists:** A single prevalence estimate gives no indication of uncertainty. AggregativeBootstrap (Moreo & Salvati, 2025) provides statistically rigorous confidence intervals for any aggregative quantifier, enabling uncertainty-aware deployment. Parameters ---------- .. list-table:: :widths: 22 15 63 :header-rows: 1 * - Parameter - Default - Explanation * - ``quantifier`` - required - The base aggregative quantifier to wrap. * - ``n_train_bootstraps`` - ``1`` - Number of bootstrap resamples of the training predictions. Increasing this to 50–200 gives more accurate confidence region estimation. * - ``n_test_bootstraps`` - ``1`` - Number of bootstrap resamples of the test predictions. Together with ``n_train_bootstraps`` this controls the total number of bootstrap rounds: ``n_train × n_test`` calls to the base quantifier's aggregate. * - ``region_type`` - ``'intervals'`` - Type of confidence region: - ``'intervals'`` — per-class credible intervals. Simple and fast. - ``'ellipse'`` — joint confidence ellipse on the prevalence simplex. - ``'ellipse-clr'`` — CLR-transformed ellipse (compositional data approach; recommended for multiclass). * - ``confidence_level`` - ``0.95`` - Confidence level for the region (e.g. 0.95 for a 95% CI). * - ``random_state`` - ``None`` - Seed for reproducibility. Examples -------- .. code-block:: python from mlquantify.meta import AggregativeBootstrap from mlquantify.likelihood import EMQ from sklearn.linear_model import LogisticRegression q = AggregativeBootstrap( EMQ(LogisticRegression()), n_train_bootstraps=100, n_test_bootstraps=100, region_type='intervals', confidence_level=0.95, ) q.fit(X_train, y_train) prevalences = q.predict(X_test) print(prevalences) # Access the confidence region after prediction # (see mlquantify.confidence for the region object API) .. seealso:: :ref:`confidence_intervals` for a full guide on confidence regions in quantification. ---- Choosing a Meta-Quantifier ============================ .. list-table:: :widths: 20 20 60 :header-rows: 1 * - Method - When to use - Key advantage * - EnsembleQ (``'all'``) - Moderate shift; need robustness - Reduces variance through diversity. * - EnsembleQ (``'ptr'``) - Unknown test prevalence region - Adapts member selection to the test estimate. * - EnsembleQ (``'ds'``) - Score variability across batches - Selects members by distribution similarity. * - QuaDapt - Score variability; DyS/HDy as base - Corrects for score distribution mismatch. * - AggregativeBootstrap - Need uncertainty quantification - Provides confidence intervals for any quantifier. **Practical recommendation:** Use **EnsembleQ** with ``selection_metric='ptr'`` and ``n_jobs=-1`` when you want the best accuracy with moderate extra cost. Use **AggregativeBootstrap** when you need to report uncertainty alongside References ========== .. dropdown:: References - Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017). Using Ensembles for Problems with Characterizable Changes in Data Distribution. *Information Fusion*, 34, 87–100. - Pérez-Gállego, P., Castaño, A., Quevedo, J. R., & del Coz, J. J. (2019). Dynamic Ensemble Selection for Quantification Tasks. *Information Fusion*, 45, 1–15. - Moreo, A., & Salvati, A. (2025). An Efficient Method for Deriving Confidence Intervals in Aggregative Quantification. *LQ 2025*. - Ortega, J. P., Luth Junior, L. F., Zalewski, W., & Maletzke, A. (2025). QuaDapt: Drift-Resilient Quantification via Parameters Adaptation. *LQ 2025*, p. 64. .. seealso:: :ref:`confidence_intervals` for the regions produced by :class:`AggregativeBootstrap`.