.. _sphx_synthetic_intro:

=========================================
Visualizing synthetic quantification data
=========================================

:func:`~mlquantify.datasets.make_quantification` builds one labelled population
and draws *bags* from it, returning each bag's features, labels, and — crucially
for quantification — its true class prevalence. Before looking at shifts and
quantifiers, it helps to simply *see* the data.

Asking for two informative features (``n_features=2, n_redundant=0``) makes the
population directly plottable: below is a single bag, with each point coloured by
its class.

.. plot::

    import numpy as np
    import matplotlib.pyplot as plt

    from mlquantify.datasets import make_quantification

    Xs, ys, prevs = make_quantification(
        n_batches=1, batch_size=800, n_classes=2,
        n_features=2, n_redundant=0, class_sep=1.6, random_state=0,
    )
    X, y = Xs[0], ys[0]

    fig, ax = plt.subplots(figsize=(6, 5))
    for k, color in enumerate(["#2a9d8f", "#e76f51"]):
        mask = y == k
        ax.scatter(X[mask, 0], X[mask, 1], s=14, alpha=0.7,
                   color=color, label=f"class {k}")
    ax.set_xlabel("x1")
    ax.set_ylabel("x2")
    ax.set_title(f"One synthetic bag — prevalence = {np.round(prevs[0], 2)}")
    ax.legend()
    fig.tight_layout()

The call returns ``Xs, ys, prevs``: lists of per-bag feature matrices and label
vectors, plus the ``(n_bags, n_classes)`` array of true prevalences. With three
informative features you would see three clusters; everything that follows uses
the same generator, just sampled differently.

.. note::

   In real experiments you keep all 20 (or more) features — ``n_features=2`` is
   only to make the population visible. The quantification behaviour is the same.

.. seealso::

   - :ref:`sphx_synthetic_shift` — how the bags change under prior shift.
   - :ref:`sphx_synthetic_prevalence` — controlling the bag-to-bag variability.