.. _sphx_real_datasets_evaluation:

====================================
Evaluating a quantifier on real data
====================================

Every fetcher has a built-in **quantification protocol**. Pass ``protocol="app"``
(the Artificial Prevalence Protocol) and the returned
:class:`~mlquantify.datasets.Bunch` gains two extra attributes:

- ``.samples`` — a list of index arrays into ``.data``, one per test *bag*;
- ``.prevalences`` — the ``(n_samples, n_classes)`` array of each bag's **true**
  class prevalence.

That is exactly what you need to score a quantifier: predict the prevalence of
every bag and compare it against the known truth.

.. code-block:: python

    import numpy as np
    from sklearn.ensemble import RandomForestClassifier

    from mlquantify.datasets import fetch_mushroom
    from mlquantify.counting import ACC
    from mlquantify.metrics import MAE

    # 500 bags of 200 instances, drawn across a range of class balances.
    data = fetch_mushroom(protocol="app", n_samples=500, sample_size=200,
                          random_state=0)
    X, y = data.data, data.target

    quantifier = ACC(RandomForestClassifier(random_state=0)).fit(X, y)

    # Score every bag against its known true prevalence (metrics are
    # ``metric(y_true, y_pred)``, like scikit-learn).
    errors = [
        MAE(true_prev, quantifier.predict(X[bag]))
        for bag, true_prev in zip(data.samples, data.prevalences)
    ]
    print(f"mean absolute error over {len(errors)} bags: {np.mean(errors):.4f}")

The Artificial Prevalence Protocol stress-tests the quantifier across the whole
range of class balances, so the mean error summarises how robust it is to prior
shift — not just how it does at the dataset's natural prevalence. Swap
``protocol="app"`` for ``"npp"``, ``"upp"`` or ``"ppp"`` to change how the bags
are sampled.

.. note::

   For brevity the quantifier is fit on the whole dataset, and the bags are
   resampled from that same pool. For a leakage-free benchmark, split the data
   first and draw the evaluation bags from a disjoint test set —
   :func:`~mlquantify.model_selection.apply_protocol` runs exactly that
   fit-then-score loop for you.

.. seealso::

   - :ref:`real_world_datasets` — the fetcher API and protocol options.
   - :ref:`sphx_protocols` — what each sampling protocol looks like.
   - :func:`~mlquantify.model_selection.apply_protocol` — the one-call
     evaluation helper.