2. Aggregative Quantification#

Aggregative quantification methods estimate class prevalences by aggregating predictions made on individual instances. All aggregative methods share a common three-step structure:

  1. fit — the model is trained on labelled data. The underlying classifier (called the estimator) is fitted, and a representation of the training class distributions is computed (confusion matrix, score histograms, density estimates, …).

  2. predict — the trained model generates predictions for each individual item in the unlabelled test set. These predictions can be hard labels or soft probabilities.

  3. aggregate — the predictions are combined into a single prevalence estimate. This is the step that distinguishes quantifiers from plain classifiers: it corrects for distributional shift, rather than just counting labels.

This separation into fit, predict, and aggregate mirrors the scikit-learn API and enables modular reuse of any standard classifier as a backbone.

The aggregate method

Every aggregative quantifier exposes an aggregate method that performs only step 3. This lets you quantify without re-predicting when classifier outputs are already available:

from mlquantify.likelihood import EMQ
from sklearn.linear_model import LogisticRegression

q = EMQ(LogisticRegression())
q.fit(X_train, y_train)

# Full pipeline
prevalences = q.predict(X_test)

# Or: re-use existing posteriors
proba = q.estimator_.predict_proba(X_test)
prevalences = q.aggregate(proba, q.train_predictions_, q.train_labels_)

For the theoretical grounding — why counting alone is biased, what dataset shift means, and how to choose a method — see Quantification Foundations.