1.3. Meta Quantification#
Meta quantification methods are a class of quantification methods that use the predictions of other quantifiers to estimate the class distribution of the test set. These methods can be seen as a meta-learner that takes the predictions of other quantifiers as input and learns to combine them to produce a more accurate estimate of the class distribution.
mlquantify provides only one meta-quantifier, the Ensemble
method, which is a meta-quantifier that uses the predictions of one quantifier several times to estimate the class distribution of the test set. The process for this method is defined as:
Take the train set and generate samples varying class distribution \(S_i\);
Copy the quantifier several times \(M_i\), fitting each one on a different sample;
Aggregate the predictions of each quantifier on the test set by mean or median.
Other way to use the Ensemble quantifier is dynamically by a criteria, proposed by Pérez-Gállegzo (2017, 2019):
Training prevalence (ptr): runs all models on the test set \(U\) and ranks them according to the difference between the mean estimated prevalence for \(U\) and the prevalence in \(S_i\)
Distribution similarity (ds): Compares the distribution of posteriors between each sample \(S_i\) and \(U\) ranking each quantifier based on the Hellinger distance computed on histograms.
The basic usage of the Ensemble
method is as follows:
from mlquantify.methods import FM, Ensemble
from sklearn.ensemble import RandomForestClassifier
import numpy as np
X_train = np.random.rand(100, 10)
y_train = np.random.randint(0, 2, size=100)
X_test = np.random.rand(50, 10)
y_test = np.random.randint(0, 2, size=50)
model = FM(RandomForestClassifier())
ensemble = Ensemble(quantifier=model,
size=50,
selection_metric='ptr', # Training prevalence
return_type='mean',
n_jobs=-1,
verbose=True)
ensemble.fit(X_train, y_train)
predictions = ensemble.predict(X_test)
print(predictions)