EMQ#

class mlquantify.likelihood.EMQ(learner=None, tol=0.0001, max_iter=100, calib_function=None, criteria=<function MAE>)[source]#

Expectation-Maximization Quantifier (EMQ).

Estimates class prevalences under prior probability shift by alternating between expectation (E) and maximization (M) steps on posterior probabilities.

E-step: .. math:

p_i^{(s+1)}(x) = \frac{q_i^{(s)} p_i(x)}{\sum_j q_j^{(s)} p_j(x)}

M-step: .. math:

q_i^{(s+1)} = \frac{1}{N} \sum_{n=1}^N p_i^{(s+1)}(x_n)

where - \(p_i(x)\) are posterior probabilities predicted by the classifier - \(q_i^{(s)}\) are class prevalence estimates at iteration \(s\) - \(N\) is the number of test instances.

Calibrations supported on posterior probabilities before EM iteration:

Temperature Scaling (TS): .. math:

\hat{p} = \text{softmax}\left(\frac{\log(p)}{T}\right)

Bias-Corrected Temperature Scaling (BCTS): .. math:

\hat{p} = \text{softmax}\left(\frac{\log(p)}{T} + b\right)

Vector Scaling (VS): .. math:

\hat{p}_i = \text{softmax}(W_i \cdot \log(p_i) + b_i)

No-Bias Vector Scaling (NBVS): .. math:

\hat{p}_i = \text{softmax}(W_i \cdot \log(p_i))

Parameters:

learnerestimator, optional: Probabilistic classifier supporting predict_proba.
tolfloat, default=1e-4: Convergence threshold.
max_iterint, default=100: Maximum EM iterations.
calib_functionstr or callable, optional: Calibration method: - ‘ts’: Temperature Scaling - ‘bcts’: Bias-Corrected Temperature Scaling - ‘vs’: Vector Scaling - ‘nbvs’: No-Bias Vector Scaling - callable: custom calibration function
criteriacallable, default=MAE: Convergence metric.

References

[1]

Saerens, M., Latinne, P., & Decaestecker, C. (2002). Adjusting the Outputs of a Classifier to New a Priori Probabilities. Neural Computation, 14(1), 2141-2156.

[2]

Esuli, A., Moreo, A., & Sebastiani, F. (2023). Learning to Quantify. Springer.

classmethod EM(posteriors, priors, tolerance=1e-06, max_iter=100, criteria=<function MAE>)[source]#

Static method implementing the EM algorithm for quantification.

Parameters:

posteriorsndarray of shape (n_samples, n_classes): Posterior probability predictions.
priorsndarray of shape (n_classes,): Training class prior probabilities.
tolerancefloat: Convergence threshold based on difference between iterations.
max_iterint: Max number of EM iterations.
criteriacallable: Metric to assess convergence, e.g., MAE.

Returns:

qsndarray of shape (n_classes,): Estimated test set class prevalences.
psndarray of shape (n_samples, n_classes): Updated soft membership probabilities per instance.

fit(X, y)[source]#: Fit the quantifier using the provided data and learner.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]#: Predict class prevalences for the given data.

save_quantifier(path: str | None = None) → None[source]#: Save the quantifier instance to a file.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

EMQ#

This Page