3.2. Bootstrap in Quantification#

Bootstrap is used in quantification to estimate uncertainty by constructing confidence regions around class prevalence estimates. Direct application is computationally expensive; thus, bootstrap is applied efficiently only to the adjustment or aggregation phases of aggregative quantifiers.

Bootstrap strategies are classified into three main types:

Model-based Bootstrap

Resamples the classifier’s cross-validation outputs during training of adjustment functions. Multiple adjustment models are fitted and applied to fixed classifier predictions, effectively avoiding repeated retraining of classifiers.

Population-based Bootstrap

Uses a single prediction set on the test data; bootstrap resamples the test predictions to generate multiple test sample bags. A single adjustment function is applied to each bag to produce bootstrap prevalence estimates.

Combined Approach

Applies both model-based and population-based resampling, generating a grid of prevalence estimates balancing computational efficiency and robustness under prior probability shift.

The AggregativeBootstrap class implements these strategies for aggregative quantifiers by using two parameters: n_train_bootstraps and n_test_bootstraps. These parameters define the number of bootstrap samples for the training and test phases, respectively.

from mlquantify.ensemble import AggregativeBootstrap
from mlquantify.neighbors import EMQ
from sklearn.ensemble import RandomForestClassifier

agg_boot = AggregativeBootstrap(
    quantifier=EMQ(RandomForestClassifier()),
    n_train_bootstraps=100,
    n_test_bootstraps=100
)
agg_boot.fit(X_train, y_train)
prevalence, conf_region = agg_boot.predict(X_test)

For information on confidence interval construction from bootstrap samples, see Percentile-Based Confidence Intervals.

References