.. _sphx_synthetic_quantifiers: ========================================== Benchmarking quantifiers on synthetic bags ========================================== With a synthetic population we know every bag's *true* prevalence, so we can score quantifiers exactly. The recipe: ask :func:`~mlquantify.datasets.make_quantification` for a fixed training sample plus many shifted test bags, fit each quantifier once on the training sample, predict every bag, and plot predicted vs. true prevalence. To make it a realistic stress test we use a **harder, three-class** problem — 20 features (mostly noise), low class separation, 5% label noise — and **stack all three shifts**: a full prior sweep plus a *small* dose of covariate and concept shift (low ``covariate_scale`` / ``concept_strength``) for extra variability. .. plot:: import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression from mlquantify import set_config from mlquantify.datasets import make_quantification from mlquantify.counting import CC, ACC from mlquantify.likelihood import EMQ from mlquantify.matching import DyS from mlquantify.visualization import DiagonalDisplay # Make every quantifier return prevalences as a plain, class-ordered array. set_config(prevalence_return_type="array") Xtr, ytr, Xs, ys, prevs = make_quantification( n_batches=200, batch_size=200, return_train=True, n_classes=3, train_prevalence=[1 / 3, 1 / 3, 1 / 3], # mostly prior shift, with a little covariate + concept for realism shift_type=["prior", "covariate", "concept"], prevalence="uniform", covariate_scale=0.3, concept_strength=0.2, n_features=20, n_redundant=0, class_sep=0.6, flip_y=0.05, random_state=0, ) methods = { "CC": CC(LogisticRegression(max_iter=1000)), "ACC": ACC(LogisticRegression(max_iter=1000)), "EMQ": EMQ(LogisticRegression(max_iter=1000)), "DyS": DyS(LogisticRegression(max_iter=1000)), } fig, axes = plt.subplots(2, 2, figsize=(9, 9)) for (name, q), ax in zip(methods.items(), axes.ravel()): q.fit(Xtr, ytr) pred = np.vstack([q.predict(Xb) for Xb in Xs]) # DiagonalDisplay colour-codes the three classes automatically. DiagonalDisplay.from_predictions(prevs, pred, ax=ax, alpha=0.4, s=14) mae = float(np.mean(np.abs(pred - prevs))) ax.set_title(f"{name} (MAE = {mae:.3f})") fig.suptitle("3-class quantifiers under stacked shift (prior + a little covariate/concept)", y=0.99) fig.tight_layout() Each panel colour-codes the three classes, and the MAE in the title is computed directly from the returned ``prevs`` — no protocol bookkeeping. Compared with an easy, clean problem every cloud is visibly wider here: the harder population and the small dose of covariate/concept shift push the estimates off the diagonal. The extra shift — which breaks the pure prior-shift assumption — perturbs the adjustment-based methods (ACC, EMQ, DyS) the most, while plain CC stays comparatively tight, a reminder that the "best" method depends on the shift. Dial ``covariate_scale`` / ``concept_strength`` up or down to control how far the bags wander, or drop them entirely (``shift_type="prior"``) for a clean prior-shift benchmark. .. seealso:: - :ref:`sphx_synthetic_difficulty` — error as a function of separability. - :ref:`sphx_method_comparison` — the same diagonal view on a real dataset. - :ref:`sphx_synthetic_prevalence` — choosing how the bags are distributed.