ComposeQuantifier#
- class mlquantify.compose.ComposeQuantifier(representation, loss, solver=None, seed=None)[source]#
Generic quantification method based on constrained regression using the QUnfold framework.
Mathematical formulation
This method estimates class prevalences by solving the following problem:
\[q \approx M \pi\]where:
\(q\) is the representation of the unlabeled test data,
\(M\) is the class-conditional representation matrix estimated from training data,
\(\pi\) is the vector of class prevalences to be estimated.
The estimation is performed by minimizing a divergence or loss function between the observed representation \(q\) and the expected representation \(M \pi\):
\[\hat{\pi} = \arg\min_{\pi} \; D(q, M\pi)\]subject to:
\[\pi_k \ge 0, \quad \sum_k \pi_k = 1\]The behavior of the method is fully determined by:
a representation \(f(x)\) that maps data into a feature space,
a distance/loss function \(D(\cdot, \cdot)\),
an optimization procedure over the probability simplex.
This implementation wraps the
qunfoldbackend while allowing the use of:native
qunfoldrepresentations and losses,custom representations implemented in
mlquantify,arbitrary distance functions, such as Topsoe or Hellinger.
- Parameters:
- representationobject
Representation object defining how to compute \(q\) and \(M\). Can be either a
qunfoldrepresentation or a custom implementation.- lossobject or callable
Loss or distance function \(D(p, q)\). Can be either a
qunfoldloss object or a callable accepting two distributions.- solverstr or callable, optional
Solver used for the constrained optimization problem. If not provided, the default solver from
qunfoldis used.- seedint, optional
Random seed used by the
qunfoldbackend for reproducible optimization.
Notes
This formulation unifies several quantification methods:
AC / CC: class-based representations
Prob / PCC: probability-based representations
HDy / DyS: histogram-based representations with divergence measures
HDx: feature-based histogram representations
The method follows the constrained regression framework described in:
\[y = X \hat{\pi}_F\]where different choices of representation and loss correspond to different quantification algorithms.
References
[1]Firat, A. (2016). Unified Framework for Quantification.
Examples
Using a class-based representation to implement an ACC-like quantifier:
>>> from sklearn.linear_model import LogisticRegression >>> from qunfold.sklearn import CVClassifier >>> from qunfold.methods.linear.losses import LeastSquaresLoss >>> from qunfold.methods.linear.representations import ClassRepresentation >>> from mlquantify.compose import ComposeQuantifier >>> >>> learner = LogisticRegression(max_iter=1000) >>> quantifier = ComposeQuantifier( ... representation=ClassRepresentation( ... CVClassifier(learner), ... is_probabilistic=False, ... ), ... loss=LeastSquaresLoss(), ... ) >>> quantifier.fit(X_train, y_train) >>> prevalences = quantifier.predict(X_test)
Using a histogram-based representation with a custom Topsoe distance from
mlquantifyto implement a DyS-like quantifier:>>> from sklearn.linear_model import LogisticRegression >>> from qunfold.sklearn import CVClassifier >>> from qunfold.methods.linear.representations import ( ... ClassRepresentation, ... HistogramRepresentation, ... ) >>> from mlquantify.compose import ComposeQuantifier >>> from mlquantify.metrics import topsoe_jax >>> >>> learner = LogisticRegression(max_iter=1000) >>> representation = HistogramRepresentation( ... n_bins=8, ... preprocessor=ClassRepresentation( ... CVClassifier(learner), ... is_probabilistic=True, ... ), ... unit_scale=False, ... ) >>> quantifier = ComposeQuantifier( ... representation=representation, ... loss=topsoe_jax, ... ) >>> quantifier.fit(X_train, y_train) >>> prevalences = quantifier.predict(X_test)
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.