3.2. Bootstrap in Quantification#
Bootstrap is used in quantification to estimate uncertainty by constructing confidence regions around class prevalence estimates. Direct application is computationally expensive; thus, bootstrap is applied efficiently only to the adjustment or aggregation phases of aggregative quantifiers.
Bootstrap strategies are classified into three main types:
Resamples the classifier’s cross-validation outputs during training of adjustment functions. Multiple adjustment models are fitted and applied to fixed classifier predictions, effectively avoiding repeated retraining of classifiers.
Uses a single prediction set on the test data; bootstrap resamples the test predictions to generate multiple test sample bags. A single adjustment function is applied to each bag to produce bootstrap prevalence estimates.
Applies both model-based and population-based resampling, generating a grid of prevalence estimates balancing computational efficiency and robustness under prior probability shift.
The AggregativeBootstrap class implements these strategies for aggregative quantifiers by using two parameters: n_train_bootstraps and n_test_bootstraps. These parameters define the number of bootstrap samples for the training and test phases, respectively.
from mlquantify.ensemble import AggregativeBootstrap
from mlquantify.neighbors import EMQ
from sklearn.ensemble import RandomForestClassifier
agg_boot = AggregativeBootstrap(
quantifier=EMQ(RandomForestClassifier()),
n_train_bootstraps=100,
n_test_bootstraps=100
)
agg_boot.fit(X_train, y_train)
prevalence, conf_region = agg_boot.predict(X_test)
For information on confidence interval construction from bootstrap samples, see Percentile-Based Confidence Intervals.