.. _sphx_grid_search: ==================================== Tuning a quantifier with GridSearchQ ==================================== Hyper-parameters should be selected to minimise *quantification* error, not classification error — and they should be evaluated across a range of prevalences, not on a single validation split. :class:`~mlquantify.model_selection.GridSearchQ` does both: it scores every candidate with an evaluation protocol and a quantification metric, then refits the best one. This example tunes the ``bandwidth`` of a :class:`~mlquantify.matching.KDEyML` quantifier. We let ``GridSearchQ`` pick the winner, then draw the validation error across the whole grid to show *why* that value won. .. plot:: import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from mlquantify.matching import KDEyML from mlquantify.metrics import MAE from mlquantify.model_selection import GridSearchQ, apply_protocol X, y = make_classification( n_samples=3000, n_features=20, weights=[0.5, 0.5], random_state=0, ) X_tr, X_val, y_tr, y_val = train_test_split( X, y, test_size=0.5, stratify=y, random_state=0, ) bandwidths = [0.02, 0.05, 0.08, 0.1, 0.15, 0.2, 0.3, 0.5] search = GridSearchQ( quantifier=KDEyML(LogisticRegression(max_iter=1000)), param_grid={"bandwidth": bandwidths}, protocol="app", samples_sizes=100, n_repetitions=5, scoring=MAE, random_seed=0, ).fit(X_tr, y_tr) # Re-trace the validation-error curve the search optimised over. scores = [] for bw in bandwidths: q = KDEyML(LogisticRegression(max_iter=1000), bandwidth=bw).fit(X_tr, y_tr) res = apply_protocol( q, X_val, y_val, protocol="app", n_prevalences=11, repeats=3, batch_size=100, random_state=0, ) scores.append(MAE(res["true_prevalences"], res["predicted_prevalences"])) best_bw = search.best_params_["bandwidth"] fig, ax = plt.subplots(figsize=(7, 4.5)) ax.plot(bandwidths, scores, "o-", color="#264653") ax.axvline(best_bw, color="#e63946", ls="--", label=f"GridSearchQ pick: bandwidth={best_bw}") ax.set_xlabel("KDEyML bandwidth") ax.set_ylabel("Validation MAE (APP)") ax.set_title("Quantification-aware hyper-parameter selection") ax.legend() fig.tight_layout() The curve is U-shaped: a too-small bandwidth spikes each class density on its training points, a too-large one blurs the classes together, and ``GridSearchQ`` lands on the bandwidth at the bottom. After fitting, ``search.predict(X)`` uses the refit best model directly, and ``search.best_params_`` / ``search.best_score_`` report the choice. .. seealso:: - :class:`~mlquantify.model_selection.GridSearchQ` — protocol, scoring and parallelism options. - :ref:`sphx_protocols` — the protocols the search can drive.