2. Aggregative Quantification#
Aggregative quantification methods estimate class prevalences by aggregating predictions made on individual instances. All aggregative methods share a common three-step structure:
fit — the model is trained on labelled data. The underlying classifier (called the estimator) is fitted, and a representation of the training class distributions is computed (confusion matrix, score histograms, density estimates, …).
predict — the trained model generates predictions for each individual item in the unlabelled test set. These predictions can be hard labels or soft probabilities.
aggregate — the predictions are combined into a single prevalence estimate. This is the step that distinguishes quantifiers from plain classifiers: it corrects for distributional shift, rather than just counting labels.
This separation into fit, predict, and aggregate mirrors the
scikit-learn API and enables modular reuse of any standard classifier as a
backbone.
The aggregate method
Every aggregative quantifier exposes an aggregate method that
performs only step 3. This lets you quantify without re-predicting when
classifier outputs are already available:
from mlquantify.likelihood import EMQ
from sklearn.linear_model import LogisticRegression
q = EMQ(LogisticRegression())
q.fit(X_train, y_train)
# Full pipeline
prevalences = q.predict(X_test)
# Or: re-use existing posteriors
proba = q.estimator_.predict_proba(X_test)
prevalences = q.aggregate(proba, q.train_predictions_, q.train_labels_)
For the theoretical grounding — why counting alone is biased, what dataset shift means, and how to choose a method — see Quantification Foundations.
- 2.1. Using Aggregative Quantification Methods
- 2.2. General Concept
- 2.3. Counting-Based Quantifiers
- 2.4. Counters For Quantification
- 2.5. Adjusted Counting
- 2.5.1. The Adjustment Formula
- 2.5.2. ACC — Adjusted Classify and Count (hard predictions)
- 2.5.3. ThresholdAdjustment — Base Class for ROC-Threshold Methods
- 2.5.4. TAC — Threshold Adjusted Count (fixed threshold)
- 2.5.5. TX — Threshold X (symmetric ROC point)
- 2.5.6. TMAX — Maximum TPR−FPR Separation
- 2.5.7. T50 — TPR ≈ 0.5 Threshold
- 2.5.8. MS — Median Sweep
- 2.5.9. MS2 — Median Sweep with Constraint
- 2.5.10. Comparing Threshold-Adjustment Methods
- 2.5.11. Threshold Adjustment
- 2.6. Likelihood Methods
- 2.6.1. Prior Probability Shift — The Core Assumption
- 2.6.2. MLPE — Maximum Likelihood Prevalence Estimation (trivial baseline)
- 2.6.3. EMQ — Expectation-Maximization Quantifier (SLD)
- 2.6.4. CDE — CDE-Iterate (threshold-adjustment via cost ratios)
- 2.6.5. Method Comparison
- 2.6.6. Maximum Likelihood Prevalence Estimation (MLPE)
- 2.6.7. Expectation Maximization for Quantification (EMQ)
- 2.7. Distribution Matching
- 2.8. Nearest Neighbours