apply_protocol#
- mlquantify.model_selection.apply_protocol(quantifier, X, y, protocol='app', *, scoring='mae', batch_size=100, n_prevalences=11, repeats=1, fit=True, test_size=0.4, stratify=True, n_jobs=1, random_state=None, return_predictions=True, return_estimator=False, verbose=0, **protocol_params)[source]#
Evaluate a quantifier across an evaluation protocol.
The protocol analogue of scikit-learn’s
cross_validate: it fits the quantifier, generates many test samples with controlled prevalences using a sampling protocol (APP,NPP,UPPorPPP), predicts the prevalence of each sample, and returns the true and predicted prevalences together with one score array per metric. This packages the standard quantification evaluation loop into a single call.- Parameters:
- quantifierBaseQuantifier
The quantifier to evaluate. When
fit=Truea copy is trained, so the passed object is left untouched; whenfit=Falseit must already be fitted.- Xarray-like of shape (n_samples, n_features)
Feature matrix.
- yarray-like of shape (n_samples,)
Class labels.
- protocol{‘app’, ‘npp’, ‘upp’, ‘ppp’} or BaseProtocol, default=’app’
Sampling protocol used to build the evaluation samples.
'app': Artificial Prevalence Protocol (grid of prevalences).'npp': Natural Prevalence Protocol (random natural samples).'upp': Uniform Prevalence Protocol (uniform over the simplex).'ppp': Personalized protocol (requiresprevalences=...).
A pre-built
BaseProtocolinstance may also be passed, in which casebatch_size,n_prevalencesandrepeatsare ignored.- scoringstr, callable, or list, default=’mae’
Metric(s) used to score each sample. A metric name (e.g.
'mae','nmd'), a callablemetric(true, pred) -> float, or a list mixing the two. Each becomes a key in the returned dictionary. Relative metrics such as'rae'/'nrae'are automatically smoothed witheps = 1 / (2 * sample_size)so classes absent from a sample (zero true prevalence) do not yieldinf.- batch_sizeint or list of int, default=100
Size of each evaluation sample.
- n_prevalencesint, default=11
Number of prevalence points (APP/UPP) or natural samples (NPP).
- repeatsint, default=1
Number of repetitions per prevalence point.
- fitbool, default=True
If
True, train a copy ofquantifierbefore evaluating; ifFalse, use the already-fittedquantifierdirectly.- test_sizefloat or int or None, default=0.4
When
fit=True, the held-out fraction (or count) the protocol samples from; the quantifier is trained on the complement.Noneor0trains and evaluates on the same data (in-sample).- stratifybool, default=True
Whether the train/evaluation split is stratified by
y.- n_jobsint or None, default=1
Number of parallel jobs over the protocol samples.
- random_stateint, RandomState instance, or None, default=None
Seed controlling the split and the protocol sampling.
- return_predictionsbool, default=True
Whether to include the true and predicted prevalence arrays.
- return_estimatorbool, default=False
Whether to include the fitted quantifier under the
'estimator'key.- verboseint, default=0
Verbosity level, following the sklearn convention:
0(orFalse) — silent.1(orTrue) — print a one-line summary: protocol name, number of batches, and the mean score for each metric.2— additionally print one line per sample showing its index, true prevalence, predicted prevalence, and per-metric scores.
- **protocol_paramsdict
Extra protocol arguments forwarded to the constructor (e.g.
min_prev,max_prev,strategyanddirichlet_alphafor APP/UPP,prevalencesfor PPP).
- Returns:
- resultsdict
Dictionary with the keys:
'true_prevalences': ndarray of shape (n_samples, n_classes), present whenreturn_predictions=True.'predicted_prevalences': ndarray of the same shape.'n_batches': int, the number of evaluation samples.one key per metric (e.g.
'MAE') : ndarray of shape (n_samples,) holding the per-sample score.'estimator': the fitted quantifier, whenreturn_estimator=True.
See also
APP,NPP,UPP,PPPThe sampling protocols this runs.
GridSearchQHyper-parameter search using the same protocols.
Notes
Aggregate a run with
results['MAE'].mean(). The true and predicted prevalence arrays are convenient for diagonal “true vs predicted” diagnostic plots.Examples
>>> from mlquantify.model_selection import apply_protocol >>> from mlquantify.counting import CC >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.datasets import make_classification >>> X, y = make_classification(n_samples=400, random_state=0) >>> results = apply_protocol( ... CC(LogisticRegression()), X, y, ... protocol="app", n_prevalences=11, batch_size=100, ... scoring=["mae", "nmd"], random_state=0, ... ) >>> results["true_prevalences"].shape (11, 2) >>> round(float(results["MAE"].mean()), 2)