Evaluating a quantifier on real data#

Every fetcher has a built-in quantification protocol. Pass protocol="app" (the Artificial Prevalence Protocol) and the returned Bunch gains two extra attributes:

.samples — a list of index arrays into .data, one per test bag;
.prevalences — the (n_samples, n_classes) array of each bag’s true class prevalence.

That is exactly what you need to score a quantifier: predict the prevalence of every bag and compare it against the known truth.

import numpy as np
from sklearn.ensemble import RandomForestClassifier

from mlquantify.datasets import fetch_mushroom
from mlquantify.counting import ACC
from mlquantify.metrics import MAE

# 500 bags of 200 instances, drawn across a range of class balances.
data = fetch_mushroom(protocol="app", n_samples=500, sample_size=200,
                      random_state=0)
X, y = data.data, data.target

quantifier = ACC(RandomForestClassifier(random_state=0)).fit(X, y)

# Score every bag against its known true prevalence (metrics are
# ``metric(y_true, y_pred)``, like scikit-learn).
errors = [
    MAE(true_prev, quantifier.predict(X[bag]))
    for bag, true_prev in zip(data.samples, data.prevalences)
]
print(f"mean absolute error over {len(errors)} bags: {np.mean(errors):.4f}")

The Artificial Prevalence Protocol stress-tests the quantifier across the whole range of class balances, so the mean error summarises how robust it is to prior shift — not just how it does at the dataset’s natural prevalence. Swap protocol="app" for "npp", "upp" or "ppp" to change how the bags are sampled.

Note

For brevity the quantifier is fit on the whole dataset, and the bags are resampled from that same pool. For a leakage-free benchmark, split the data first and draw the evaluation bags from a disjoint test set — apply_protocol runs exactly that fit-then-score loop for you.