2. Aggregative Quantification#
Aggregative quantification methods estimate class prevalences by aggregating predictions made on individual instances. All aggregative methods share a common three-step structure:
fit — the model is trained on labelled data. The underlying classifier (called the estimator) is fitted, and a representation of the training class distributions is computed (confusion matrix, score histograms, density estimates, …).
predict — the trained model generates predictions for each individual item in the unlabelled test set. These predictions can be hard labels or soft probabilities.
aggregate — the predictions are combined into a single prevalence estimate. This is the step that distinguishes quantifiers from plain classifiers: it corrects for distributional shift, rather than just counting labels.
This separation into fit, predict, and aggregate mirrors the
scikit-learn API and enables modular reuse of any standard classifier as a
backbone.
The aggregate method
Every aggregative quantifier exposes an aggregate method that
performs only step 3. This lets you quantify without re-predicting when
classifier outputs are already available:
from mlquantify.likelihood import EMQ
from sklearn.linear_model import LogisticRegression
q = EMQ(LogisticRegression())
q.fit(X_train, y_train)
# Full pipeline
prevalences = q.predict(X_test)
# Or: re-use existing posteriors
proba = q.estimator_.predict_proba(X_test)
prevalences = q.aggregate(proba, q.train_predictions_, q.train_labels_)
For the theoretical grounding — why counting alone is biased, what dataset shift means, and how to choose a method — see Quantification Foundations.
- 2.1. Using Aggregative Quantification Methods
- 2.2. General Concept
- 2.3. Counting Methods
- 2.4. Adjusted Counting
- 2.4.1. Problem formulation
- 2.4.2. The Adjustment Formula
- 2.4.3. ACC — Adjusted Classify and Count (hard predictions)
- 2.4.4. ThresholdAdjustment — Base Class for ROC-Threshold Methods
- 2.4.5. TAC — Threshold Adjusted Count (fixed threshold)
- 2.4.6. TX — Threshold X (symmetric ROC point)
- 2.4.7. TMAX — Maximum TPR−FPR Separation
- 2.4.8. T50 — TPR ≈ 0.5 Threshold
- 2.4.9. MS — Median Sweep
- 2.4.10. MS2 — Median Sweep with Constraint
- 2.4.11. Comparing Threshold-Adjustment Methods
- 2.4.12. Assumptions and when to use
- 2.4.13. References
- 2.5. Likelihood Methods
- 2.6. Distribution Matching
- 2.7. Nearest Neighbours