EMQ#

class mlquantify.likelihood.EMQ(learner=None, tol=0.0001, max_iter=100, calib_function=None, criteria=<function MAE>)[source]#

Expectation-Maximization Quantifier (EMQ).

Estimates class prevalences under prior probability shift by alternating between expectation (E) and maximization (M) steps on posterior probabilities.

E-step: .. math:

p_i^{(s+1)}(x) = \frac{q_i^{(s)} p_i(x)}{\sum_j q_j^{(s)} p_j(x)}

M-step: .. math:

q_i^{(s+1)} = \frac{1}{N} \sum_{n=1}^N p_i^{(s+1)}(x_n)

where - \(p_i(x)\) are posterior probabilities predicted by the classifier - \(q_i^{(s)}\) are class prevalence estimates at iteration \(s\) - \(N\) is the number of test instances.

Calibrations supported on posterior probabilities before EM iteration:

Temperature Scaling (TS): .. math:

\hat{p} = \text{softmax}\left(\frac{\log(p)}{T}\right)

Bias-Corrected Temperature Scaling (BCTS): .. math:

\hat{p} = \text{softmax}\left(\frac{\log(p)}{T} + b\right)

Vector Scaling (VS): .. math:

\hat{p}_i = \text{softmax}(W_i \cdot \log(p_i) + b_i)

No-Bias Vector Scaling (NBVS): .. math:

\hat{p}_i = \text{softmax}(W_i \cdot \log(p_i))
Parameters:
learnerestimator, optional

Probabilistic classifier supporting predict_proba.

tolfloat, default=1e-4

Convergence threshold.

max_iterint, default=100

Maximum EM iterations.

calib_functionstr or callable, optional

Calibration method: - ‘ts’: Temperature Scaling - ‘bcts’: Bias-Corrected Temperature Scaling - ‘vs’: Vector Scaling - ‘nbvs’: No-Bias Vector Scaling - callable: custom calibration function

criteriacallable, default=MAE

Convergence metric.

References

[1]

Saerens, M., Latinne, P., & Decaestecker, C. (2002). Adjusting the Outputs of a Classifier to New a Priori Probabilities. Neural Computation, 14(1), 2141-2156.

[2]

Esuli, A., Moreo, A., & Sebastiani, F. (2023). Learning to Quantify. Springer.

classmethod EM(posteriors, priors, tolerance=1e-06, max_iter=100, criteria=<function MAE>)[source]#

Static method implementing the EM algorithm for quantification.

Parameters:
posteriorsndarray of shape (n_samples, n_classes)

Posterior probability predictions.

priorsndarray of shape (n_classes,)

Training class prior probabilities.

tolerancefloat

Convergence threshold based on difference between iterations.

max_iterint

Max number of EM iterations.

criteriacallable

Metric to assess convergence, e.g., MAE.

Returns:
qsndarray of shape (n_classes,)

Estimated test set class prevalences.

psndarray of shape (n_samples, n_classes)

Updated soft membership probabilities per instance.

fit(X, y)[source]#

Fit the quantifier using the provided data and learner.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class prevalences for the given data.

save_quantifier(path: str | None = None) None[source]#

Save the quantifier instance to a file.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.