EMQ#
- class mlquantify.likelihood.EMQ(estimator=None, tol=0.0001, max_iter=100, calib_function=None, criteria=<function MAE>, on_calib_error='backup')[source]#
Expectation-Maximization Quantifier (EMQ / SLD).
Estimates class prevalences under prior probability shift by iterating between re-weighting posterior probabilities to reflect the current prevalence estimate (E-step) and updating the prevalence estimate as their average (M-step). Optionally applies a calibration step before the EM iteration to improve posterior quality.
Supported calibration methods via
calib_function: Temperature Scaling ('ts'), Bias-Corrected Temperature Scaling ('bcts'), Vector Scaling ('vs'), and No-Bias Vector Scaling ('nbvs').- Parameters:
- estimatorestimator, optional
A probabilistic classifier with
fitandpredict_probamethods.- tolfloat, default=1e-4
Convergence threshold on the prevalence change between iterations.
- max_iterint, default=100
Maximum number of EM iterations.
- calib_function{‘ts’, ‘bcts’, ‘vs’, ‘nbvs’} or callable or None, default=None
Calibration applied to posteriors before EM.
Noneskips calibration.- criteriacallable, default=MAE
Convergence criterion comparing successive prevalence estimates.
- on_calib_error{‘raise’, ‘backup’}, default=’backup’
Behaviour when calibration fails:
'raise're-raises the error;'backup'falls back to uncalibrated posteriors.
- Attributes:
- estimator_estimator
The fitted underlying classifier.
- classes_ndarray of shape (n_classes,)
Class labels seen during
fit.- priors_ndarray of shape (n_classes,)
Training class prevalences.
References
References
[1]Saerens, M., Latinne, P., & Decaestecker, C. (2002). Adjusting the Outputs of a Classifier to New a Priori Probabilities. Neural Computation, 14(1), 2141–2156.
[2]Alexandari, A., Kundaje, A., & Shrikumar, A. (2020). Maximum Likelihood with Bias-Corrected Calibration is Hard-to-Beat at Label Shift Adaptation. ICML, pp. 222–232.
[3]Esuli, A., Moreo, A., & Sebastiani, F. (2023). Learning to Quantify. Springer.
Examples
>>> from mlquantify.likelihood import EMQ >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.datasets import make_classification >>> X, y = make_classification(n_samples=200, random_state=42) >>> q = EMQ(estimator=LogisticRegression()).fit(X, y) >>> q.predict(X) {0: 0.49, 1: 0.51} >>> # call aggregate with pre-computed posteriors >>> proba_train = q.estimator_.predict_proba(X) >>> proba_test = q.estimator_.predict_proba(X) >>> q.aggregate(proba_test, proba_train, y) {0: 0.49, 1: 0.51}
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.