1.3. Meta Quantification#

Meta quantification methods are a class of quantification methods that use the predictions of other quantifiers to estimate the class distribution of the test set. These methods can be seen as a meta-learner that takes the predictions of other quantifiers as input and learns to combine them to produce a more accurate estimate of the class distribution.

mlquantify provides only one meta-quantifier, the Ensemble method, which is a meta-quantifier that uses the predictions of one quantifier several times to estimate the class distribution of the test set. The process for this method is defined as:

  • Take the train set and generate samples varying class distribution \(S_i\);

  • Copy the quantifier several times \(M_i\), fitting each one on a different sample;

  • Aggregate the predictions of each quantifier on the test set by mean or median.

Other way to use the Ensemble quantifier is dynamically by a criteria, proposed by Pérez-Gállegzo (2017, 2019):

  • Training prevalence (ptr): runs all models on the test set \(U\) and ranks them according to the difference between the mean estimated prevalence for \(U\) and the prevalence in \(S_i\)

  • Distribution similarity (ds): Compares the distribution of posteriors between each sample \(S_i\) and \(U\) ranking each quantifier based on the Hellinger distance computed on histograms.

The basic usage of the Ensemble method is as follows:

from mlquantify.methods import FM, Ensemble
from sklearn.ensemble import RandomForestClassifier
import numpy as np

X_train = np.random.rand(100, 10)
y_train = np.random.randint(0, 2, size=100)
X_test = np.random.rand(50, 10)
y_test = np.random.randint(0, 2, size=50)

model = FM(RandomForestClassifier())
ensemble = Ensemble(quantifier=model,
                    size=50,
                    selection_metric='ptr', # Training prevalence
                    return_type='mean',
                    n_jobs=-1,
                    verbose=True)

ensemble.fit(X_train, y_train)

predictions = ensemble.predict(X_test)

print(predictions)