4.1. Meta-Quantification Methods#
Meta-quantifiers wrap an existing base quantifier and add higher-level strategies — ensembling, adaptive score correction, or bootstrap confidence estimation — to improve accuracy or reliability.
4.1.1. EnsembleQ — Ensemble of Quantifiers#
EnsembleQ (Pérez-Gállego et al., 2017, 2019) creates a diverse
ensemble of base quantifiers, each trained on a subsample with a different
class prevalence. Diversity in training prevalences makes the ensemble
robust to test conditions not seen by any single model.
Three phases:
Sample generation — draw \(K\) training batches with prevalences sampled from a chosen protocol (uniform, artificial, natural).
Training — fit an independent copy of the base quantifier on each batch.
Aggregation — average (or take the median of) all members’ predictions, optionally keeping only the most relevant members.
Why it excels: A single quantifier may be over-tuned to the training prevalence. The ensemble explores the full prevalence space during training and aggregates across diverse operating points, reducing both bias and variance of the final estimate.
4.1.1.1. Parameters#
Parameter |
Default |
Explanation |
|---|---|---|
|
required |
The base quantifier. Any |
|
|
Number of ensemble members. More members → more diversity and smoother estimates, but linearly more training time. 20–50 is a good range. |
|
|
Minimum class proportion for sampling batches. Set to |
|
|
Maximum class proportion. |
|
|
Which members to include in the final aggregation:
|
|
|
Fraction of members retained when |
|
|
Sampling protocol for generating training prevalences:
|
|
|
Aggregation function across selected members. |
|
|
Maximum training-batch size. |
|
|
Parallel training of ensemble members. |
|
|
Print progress during fit and predict. |
4.1.1.2. Examples#
Basic ensemble:
from mlquantify.meta import EnsembleQ
from mlquantify.matching import DyS
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, weights=[0.8, 0.2],
random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42)
q = EnsembleQ(
quantifier=DyS(LogisticRegression()),
size=30,
protocol='uniform',
n_jobs=-1,
)
q.fit(X_train, y_train)
print(q.predict(X_test))
Using PTR selection to adapt to test prevalence:
q = EnsembleQ(
quantifier=DyS(LogisticRegression()),
size=50,
selection_metric='ptr', # keep members closest to test prevalence
p_metric=0.25, # keep top 25%
return_type='median',
n_jobs=-1,
)
q.fit(X_train, y_train)
print(q.predict(X_test))
Note
selection_metric='ds' requires a probabilistic base quantifier and
is binary-only. It fits an internal logistic regression to compute
posterior histograms for the distribution similarity check.
4.1.2. QuaDapt — Adaptive Score Simulation#
QuaDapt (Maletzke et al., 2021) improves prevalence estimation by
simulating a synthetic training-score distribution — via the MoSS (Model
for Score Simulation) — that best matches the observed test-score distribution.
The best-matching synthetic set is then used as the training reference for the
wrapped quantifier’s aggregate call.
Why it exists: Histogram and density matching methods rely on training scores that may come from a very different score distribution than the test set (due to score variability — the classifier’s output range or sharpness changes at test time). QuaDapt adaptively selects a synthetic distribution that bridges this gap, achieving state-of-the-art results on tasks with high score variability.
Binary-only (OvR for multiclass).
4.1.2.1. Parameters#
Parameter |
Default |
Explanation |
|---|---|---|
|
required |
A soft (probabilistic) base aggregative quantifier (e.g.
|
|
|
Distance metric for comparing test and synthetic distributions. Options:
|
|
|
Candidate merging-factor values for MoSS. The merging factor controls
how much positive and negative scores overlap in the synthetic set. A
finer grid (e.g. |
|
|
Multiclass decomposition. |
4.1.2.2. Examples#
from mlquantify.meta import QuaDapt
from mlquantify.matching import DyS
from sklearn.linear_model import LogisticRegression
q = QuaDapt(
quantifier=DyS(LogisticRegression()),
measure='topsoe',
merging_factors=[0.1, 0.3, 0.5, 0.7, 0.9],
)
q.fit(X_train, y_train)
print(q.predict(X_test))
4.1.3. AggregativeBootstrap — Confidence Intervals via Bootstrap#
AggregativeBootstrap wraps any aggregative quantifier and applies
bootstrap resampling to both training and test predictions, generating a
distribution of prevalence estimates. The distribution is summarised as a
point estimate together with a confidence region.
Why it exists: A single prevalence estimate gives no indication of uncertainty. AggregativeBootstrap (Moreo & Salvati, 2025) provides statistically rigorous confidence intervals for any aggregative quantifier, enabling uncertainty-aware deployment.
4.1.3.1. Parameters#
Parameter |
Default |
Explanation |
|---|---|---|
|
required |
The base aggregative quantifier to wrap. |
|
|
Number of bootstrap resamples of the training predictions. Increasing this to 50–200 gives more accurate confidence region estimation. |
|
|
Number of bootstrap resamples of the test predictions. Together with
|
|
|
Type of confidence region:
|
|
|
Confidence level for the region (e.g. 0.95 for a 95% CI). |
|
|
Seed for reproducibility. |
4.1.3.2. Examples#
from mlquantify.meta import AggregativeBootstrap
from mlquantify.likelihood import EMQ
from sklearn.linear_model import LogisticRegression
q = AggregativeBootstrap(
EMQ(LogisticRegression()),
n_train_bootstraps=100,
n_test_bootstraps=100,
region_type='intervals',
confidence_level=0.95,
)
q.fit(X_train, y_train)
prevalences = q.predict(X_test)
print(prevalences)
# Access the confidence region after prediction
# (see mlquantify.confidence for the region object API)
See also
Percentile-Based Confidence Intervals for a full guide on confidence regions in quantification.
4.1.4. Choosing a Meta-Quantifier#
Method |
When to use |
Key advantage |
|---|---|---|
EnsembleQ ( |
Moderate shift; need robustness |
Reduces variance through diversity. |
EnsembleQ ( |
Unknown test prevalence region |
Adapts member selection to the test estimate. |
EnsembleQ ( |
Score variability across batches |
Selects members by distribution similarity. |
QuaDapt |
Score variability; DyS/HDy as base |
Corrects for score distribution mismatch. |
AggregativeBootstrap |
Need uncertainty quantification |
Provides confidence intervals for any quantifier. |
Practical recommendation: Use EnsembleQ with selection_metric='ptr'
and n_jobs=-1 when you want the best accuracy with moderate extra cost.
Use AggregativeBootstrap when you need to report uncertainty alongside
your prevalence estimate.
4.1.5. Ensemble for Quantification#
Ensembles for Quantification (EnsembleQ) represent a class of algorithms aimed at improving the accuracy and robustness of class prevalence estimation by combining multiple base quantifiers trained on varied data samples with controlled prevalence distributions. Different training subsets simulate varying class distributions to introduce diversity in the ensemble, which helps address predictable changes in class priors (Prior Probability Shift or Label Shift).
The algorithm can be divided into three main phases:
Multiple training subsets with varied prevalence \(p_j\) sampled from protocol (‘artificial’, ‘natural’, ‘uniform’, ‘kraemer’).
Each batch trains a base quantifier independently with parameters estimated via cross-validation.
All models predict \(\hat{p}_j\), aggregated via mean/median with optional selection (‘all’, ‘ptr’, ‘ds’).
Advantages include risk reduction, correction of instability in base quantifiers, and resilience to widely varying test prevalence.
Mathematical Definition
Given training class-conditional feature distributions \(p(x|+)\) and \(p(x|-)\) and an unlabeled test set \(U\), each training batch simulates a mixture distribution:
A diversity of prevalence values \(\alpha\) is sampled according to the chosen protocol to generate training batches \(D_j\). Each base quantifier is trained on these batches.
Final ensemble prevalence estimate \(\hat{p}_{final}\) is computed as:
where aggregation is typically mean or median, optionally weighted by selection metrics.
Selection policies used during aggregation:
‘all’: Uses all ensemble members equally without any selection or weighting.
‘ptr’ (Prevalence Training Ratio): Selects models whose training prevalence \(p_j\) is closest to an initial prevalence estimate of the test set, often computed as the mean of all base predictions.
‘ds’ (Distribution Similarity): Selects models whose training posterior score distributions are most similar to the test set distribution, measured with metrics such as Hellinger Distance. This requires probabilistic quantifiers capable of producing posterior probabilities.
Example
from mlquantify.meta import EnsembleQ
from mlquantify.matching import DyS
from sklearn.ensemble import RandomForestClassifier
ensemble = EnsembleQ(
quantifier=DyS(RandomForestClassifier()),
size=30,
protocol='artificial',
selection_metric='ptr'
)
ensemble.fit(X_train, y_train)
prevalence_estimates = ensemble.predict(X_test)