ComposeQuantifier#

class mlquantify.compose.ComposeQuantifier(representation, loss, solver=None, seed=None)[source]#

Generic quantification method based on constrained regression using the QUnfold framework.

Mathematical formulation

This method estimates class prevalences by solving the following problem:

\[q \approx M \pi\]

where:

  • \(q\) is the representation of the unlabeled test data,

  • \(M\) is the class-conditional representation matrix estimated from training data,

  • \(\pi\) is the vector of class prevalences to be estimated.

The estimation is performed by minimizing a divergence or loss function between the observed representation \(q\) and the expected representation \(M \pi\):

\[\hat{\pi} = \arg\min_{\pi} \; D(q, M\pi)\]

subject to:

\[\pi_k \ge 0, \quad \sum_k \pi_k = 1\]

The behavior of the method is fully determined by:

  • a representation \(f(x)\) that maps data into a feature space,

  • a distance/loss function \(D(\cdot, \cdot)\),

  • an optimization procedure over the probability simplex.

This implementation wraps the qunfold backend while allowing the use of:

  • native qunfold representations and losses,

  • custom representations implemented in mlquantify,

  • arbitrary distance functions, such as Topsoe or Hellinger.

Parameters:
representationobject

Representation object defining how to compute \(q\) and \(M\). Can be either a qunfold representation or a custom implementation.

lossobject or callable

Loss or distance function \(D(p, q)\). Can be either a qunfold loss object or a callable accepting two distributions.

solverstr or callable, optional

Solver used for the constrained optimization problem. If not provided, the default solver from qunfold is used.

seedint, optional

Random seed used by the qunfold backend for reproducible optimization.

Notes

This formulation unifies several quantification methods:

  • AC / CC: class-based representations

  • Prob / PCC: probability-based representations

  • HDy / DyS: histogram-based representations with divergence measures

  • HDx: feature-based histogram representations

The method follows the constrained regression framework described in:

\[y = X \hat{\pi}_F\]

where different choices of representation and loss correspond to different quantification algorithms.

References

[1]

Firat, A. (2016). Unified Framework for Quantification.

Examples

Using a class-based representation to implement an ACC-like quantifier:

>>> from sklearn.linear_model import LogisticRegression
>>> from qunfold.sklearn import CVClassifier
>>> from qunfold.methods.linear.losses import LeastSquaresLoss
>>> from qunfold.methods.linear.representations import ClassRepresentation
>>> from mlquantify.compose import ComposeQuantifier
>>>
>>> learner = LogisticRegression(max_iter=1000)
>>> quantifier = ComposeQuantifier(
...     representation=ClassRepresentation(
...         CVClassifier(learner),
...         is_probabilistic=False,
...     ),
...     loss=LeastSquaresLoss(),
... )
>>> quantifier.fit(X_train, y_train)
>>> prevalences = quantifier.predict(X_test)

Using a histogram-based representation with a custom Topsoe distance from mlquantify to implement a DyS-like quantifier:

>>> from sklearn.linear_model import LogisticRegression
>>> from qunfold.sklearn import CVClassifier
>>> from qunfold.methods.linear.representations import (
...     ClassRepresentation,
...     HistogramRepresentation,
... )
>>> from mlquantify.compose import ComposeQuantifier
>>> from mlquantify.metrics import topsoe_jax
>>>
>>> learner = LogisticRegression(max_iter=1000)
>>> representation = HistogramRepresentation(
...     n_bins=8,
...     preprocessor=ClassRepresentation(
...         CVClassifier(learner),
...         is_probabilistic=True,
...     ),
...     unit_scale=False,
... )
>>> quantifier = ComposeQuantifier(
...     representation=representation,
...     loss=topsoe_jax,
... )
>>> quantifier.fit(X_train, y_train)
>>> prevalences = quantifier.predict(X_test)
get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

save_quantifier(path: str | None = None) None[source]#

Save the quantifier instance to a file.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.