EMQ#

class mlquantify.likelihood.EMQ(estimator=None, tol=0.0001, max_iter=100, calib_function=None, criteria=<function MAE>, on_calib_error='backup')[source]#

Expectation-Maximization Quantifier (EMQ / SLD).

Estimates class prevalences under prior probability shift by iterating between re-weighting posterior probabilities to reflect the current prevalence estimate (E-step) and updating the prevalence estimate as their average (M-step). Optionally applies a calibration step before the EM iteration to improve posterior quality.

Supported calibration methods via calib_function: Temperature Scaling ('ts'), Bias-Corrected Temperature Scaling ('bcts'), Vector Scaling ('vs'), and No-Bias Vector Scaling ('nbvs').

Parameters:
estimatorestimator, optional

A probabilistic classifier with fit and predict_proba methods.

tolfloat, default=1e-4

Convergence threshold on the prevalence change between iterations.

max_iterint, default=100

Maximum number of EM iterations.

calib_function{‘ts’, ‘bcts’, ‘vs’, ‘nbvs’} or callable or None, default=None

Calibration applied to posteriors before EM. None skips calibration.

criteriacallable, default=MAE

Convergence criterion comparing successive prevalence estimates.

on_calib_error{‘raise’, ‘backup’}, default=’backup’

Behaviour when calibration fails: 'raise' re-raises the error; 'backup' falls back to uncalibrated posteriors.

Attributes:
estimator_estimator

The fitted underlying classifier.

classes_ndarray of shape (n_classes,)

Class labels seen during fit.

priors_ndarray of shape (n_classes,)

Training class prevalences.

References

References
[1]

Saerens, M., Latinne, P., & Decaestecker, C. (2002). Adjusting the Outputs of a Classifier to New a Priori Probabilities. Neural Computation, 14(1), 2141–2156.

[2]

Alexandari, A., Kundaje, A., & Shrikumar, A. (2020). Maximum Likelihood with Bias-Corrected Calibration is Hard-to-Beat at Label Shift Adaptation. ICML, pp. 222–232.

[3]

Esuli, A., Moreo, A., & Sebastiani, F. (2023). Learning to Quantify. Springer.

Examples

>>> from mlquantify.likelihood import EMQ
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=200, random_state=42)
>>> q = EMQ(estimator=LogisticRegression()).fit(X, y)
>>> q.predict(X)
{0: 0.49, 1: 0.51}
>>> # call aggregate with pre-computed posteriors
>>> proba_train = q.estimator_.predict_proba(X)
>>> proba_test = q.estimator_.predict_proba(X)
>>> q.aggregate(proba_test, proba_train, y)
{0: 0.49, 1: 0.51}
get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

save_quantifier(path: str | None = None) None[source]#

Save the quantifier instance to a file.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.