DyS#

class mlquantify.mixture.DyS(learner=None, measure='topsoe', bins_size=None)[source]#

Distribution y-Similarity (DyS) quantification method.

Uses mixture modeling with a dissimilarity measure between distributions computed on histograms of classifier scores. This method optimizes mixture weights by minimizing a chosen distance measure: Hellinger, Topsoe, or ProbSymm.

Parameters:

learnerestimator, optional: Base probabilistic classifier.
measure{‘hellinger’, ‘topsoe’, ‘probsymm’}, default=’topsoe’: Distance function to minimize.
bins_sizearray-like or None: Histogram bin sizes to try for score representation. Defaults to a set of bin sizes between 2 and 30.

References

[1] Maletzke et al. (2019). DyS: A Framework for Mixture Models in Quantification. AAAI 2019. [2] Esuli et al. (2023). Learning to Quantify. Springer.

Examples

>>> from mlquantify.mixture import DyS
>>> from sklearn.linear_model import LogisticRegression
>>> q = DyS(learner=LogisticRegression(), measure="hellinger")
>>> q.fit(X_train, y_train)
>>> prevalences = q.predict(X_test)

aggregate(*args)[source]#: Aggregate binary predictions to obtain multiclass prevalence estimates.

best_mixture(predictions, pos_scores, neg_scores)[source]#

Determine the best mixture parameters for the given data.

Applies ternary search to find the mixture weight minimizing the distance between the test score histogram and the mixture of positive and negative

The mixture weight \(\alpha\) is estimated as: .. math:

\alpha = \arg \min_{\alpha \in [0, 1]} D \left( H_{test}, \alpha H_{pos} + (1 - \alpha) H_{neg} \right)

where \(D\) is the selected distance measure and \(H\) denotes histograms.

Parameters:

predictionsndarray: Classifier scores for the test data.
pos_scoresndarray: Classifier scores for the positive class from training data.
neg_scoresndarray: Classifier scores for the negative class from training data.

Returns:

alphafloat: Estimated mixture weight.
best_distancefloat: Distance corresponding to the best mixture weight.

fit(X, y)[source]#: Fit the quantifier under a binary decomposition strategy.

get_best_distance(*args, **kwargs)[source]#

Get the best distance value from the mixture fitting process.

Notes

If the quantifier has not been fitted yet, it will fit the model for getting the best distance.

classmethod get_distance(dist_train, dist_test, measure='hellinger')[source]#: Compute distance between two distributions.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]#: Predict class prevalences using the trained binary quantifiers.

save_quantifier(path: str | None = None) → None[source]#: Save the quantifier instance to a file.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

DyS#

This Page