.. _sphx_synthetic_shift: ======================= Prior shift, bag by bag ======================= Quantification lives or dies by **prior-probability shift**: the class distribution :math:`P(y)` changes between bags while the class-conditional feature distribution :math:`P(x \mid y)` stays the same. With :func:`~mlquantify.datasets.make_quantification` you can dial in exactly the prevalences you want by passing an explicit array of vectors, then watch the clusters keep their shape while their *balance* shifts. .. plot:: import matplotlib.pyplot as plt from mlquantify.datasets import make_quantification # One bag per target prevalence — same population, different class balance. targets = [[0.9, 0.1], [0.7, 0.3], [0.5, 0.5], [0.3, 0.7], [0.1, 0.9]] Xs, ys, prevs = make_quantification( prevalence=targets, batch_size=500, n_features=2, n_redundant=0, class_sep=1.6, random_state=0, ) fig, axes = plt.subplots(1, 5, figsize=(15, 3.3), sharex=True, sharey=True) for ax, X, y, p in zip(axes, Xs, ys, prevs): for k, color in enumerate(["#2a9d8f", "#e76f51"]): mask = y == k ax.scatter(X[mask, 0], X[mask, 1], s=8, alpha=0.6, color=color) ax.set_title(f"class 1 = {p[1]:.0%}") ax.set_xticks([]) ax.set_yticks([]) fig.suptitle("Same clusters, shifting prevalence — prior-probability shift", y=1.02) fig.tight_layout() From left to right the orange class grows from 10% to 90% of the bag, yet each class always falls in the same region of feature space. That is precisely the regime quantifiers are built for — and precisely where a plain classifier's count drifts off, because its error rates were learned at a different balance. Pass ``prevalence="uniform"`` instead of an explicit list to draw the prevalences randomly across the whole simplex; see :ref:`sphx_synthetic_prevalence`. .. seealso:: - :ref:`sphx_synthetic_quantifiers` — how methods cope with this shift. - :ref:`sphx_cc_under_shift` — the bias prior shift induces in counting.