5. Model Selection and Evaluation#
Evaluating and selecting quantification models requires dedicated protocols and metrics — standard classification tools (train/test split, accuracy, F1) are not sufficient because quantification performance depends on the prevalence distribution of the test data, not just individual labels.
This section covers the full evaluation workflow:
Protocols — how to generate many test samples with varying prevalences from a single dataset (
APP,UPP,NPP).Hyperparameter tuning — how to use
GridSearchQto select the best quantifier configuration.Evaluation metrics — which error measure to use and when.
Quick example — full evaluation pipeline:
from mlquantify.likelihood import EMQ
from mlquantify.model_selection import APP
from mlquantify.metrics import MAE
from mlquantify.utils import get_prev_from_labels
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np
X, y = make_classification(n_samples=2000, weights=[0.7, 0.3],
random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=42)
q = EMQ(LogisticRegression())
q.fit(X_train, y_train)
protocol = APP(batch_size=100, n_prevalences=21, repeats=10,
random_state=42)
errors = []
for idx in protocol.split(X_test, y_test):
X_s, y_s = X_test[idx], y_test[idx]
tp = get_prev_from_labels(y_s)
pp = q.predict(X_s)
errors.append(MAE(tp, pp))
print(f"Mean MAE: {np.mean(errors):.4f} ± {np.std(errors):.4f}")
- 5.1. Evaluation Protocols
- 5.1.1. APP — Artificial Prevalence Protocol
- 5.1.2. NPP — Natural Prevalence Protocol
- 5.1.3. UPP — Uniform Prevalence Protocol
- 5.1.4. Choosing a Protocol
- 5.1.5. Protocols for Quantification
- 5.1.6. Artificial-Prevalence Protocol (APP)
- 5.1.7. Natural-Prevalence Protocol (NPP)
- 5.1.8. Uniform Prevalence Protocol (UPP)
- 5.1.9. Personalized Prevalence Protocol (PPP)
- 5.1.10. References
- 5.2. Hyperparameter Tuning
- 5.3. Evaluation Metrics
- 5.4. Single Label Quantification (SLQ) Metrics
- 5.4.1. AE (Absolute Error)
- 5.4.2. SE (Squared Error)
- 5.4.3. MAE (Mean Absolute Error)
- 5.4.4. MSE (Mean Squared Error)
- 5.4.5. KLD (Kullback-Leibler Divergence)
- 5.4.6. RAE (Relative Absolute Error)
- 5.4.7. NAE (Normalized Absolute Error)
- 5.4.8. NRAE (Normalized Relative Absolute Error)
- 5.4.9. NKLD (Normalized Kullback-Leibler Divergence)
- 5.5. Regression-Based Quantification (RQ) Metrics
- 5.6. Ordinal Quantification (OQ) Metrics