3.1. Ensemble for Quantification#

Ensembles for Quantification (EnsembleQ) represent a class of algorithms aimed at improving the accuracy and robustness of class prevalence estimation by combining multiple base quantifiers trained on varied data samples with controlled prevalence distributions. Different training subsets simulate varying class distributions to introduce diversity in the ensemble, which helps address predictable changes in class priors (Prior Probability Shift or Label Shift).

The algorithm can be divided into three main phases:

Phase 1: Sample Generation

Multiple training subsets with varied prevalence \(p_j\) sampled from protocol (‘artificial’, ‘natural’, ‘uniform’, ‘kraemer’).

Phase 2: Model Training

Each batch trains a base quantifier independently with parameters estimated via cross-validation.

Phase 3: Aggregation

All models predict \(\hat{p}_j\), aggregated via mean/median with optional selection (‘all’, ‘ptr’, ‘ds’).

Advantages include risk reduction, correction of instability in base quantifiers, and resilience to widely varying test prevalence.

Example

from mlquantify.ensemble import EnsembleQ
from mlquantify.mixture import DyS
from sklearn.ensemble import RandomForestClassifier

ensemble = EnsembleQ(
     quantifier=DyS(RandomForestClassifier()),
     size=30,
     protocol='artificial',
     selection_metric='ptr'
)
ensemble.fit(X_train, y_train)
prevalence_estimates = ensemble.predict(X_test)

3.1. Ensemble for Quantification#

This Page