mlquantify package#

Subpackages#

Submodules#

mlquantify.base module#

class mlquantify.base.BaseQuantifier[source]#

Bases: ABC, BaseEstimator

Base class for all quantifiers in mlquantify.

Inhering from this class provides default implementations for

  • setting and getting parameters used by GridSearchQ and friends;

  • saving/loading quantifier instances;

  • parameter validation.

Read more in User Guide.

Notes

All quantifiers should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments. (No *args or **kwargs allowed.)

Examples

>>> from mlquantify.base import BaseQuantifier
>>> import numpy as np
>>> class MyQuantifier(BaseQuantifier):
...     def __init__(self, param1=42, param2='default'):
...         self.param1 = param1
...         self.param2 = param2
...     def fit(self, X, y):
...         self.classes_ = np.unique(y)
...         return self
...     def predict(self, X):
...         _, counts = np.unique(self.classes_, return_counts=True)
...         prevalence = counts / counts.sum()
...         return prevalence
>>> quantifier = MyQuantifier(param1=10, param2='custom')
>>> quantifier.get_params()
{'param1': 10, 'param2': 'custom'}
>>> X = np.random.rand(100, 10)
>>> y = np.random.randint(0, 2, size=100)
>>> quantifier.fit(X, y).predict(X)
[0.5 0.5]
save_quantifier(path: str | None = None) None[source]#

Save the quantifier instance to a file.

skip_validation: bool = False[source]#
class mlquantify.base.MetaquantifierMixin[source]#

Bases: object

Mixin class for meta-quantifiers.

This mixin is empty, and only exists to indicate that the quantifier is a meta quantifier

Examples

>>> from mlquantify.base import BaseQuantifier, MetaquantifierMixin
>>> from mlquantify.adjust_counting import CC
>>> class MyMetaQuantifier(MetaquantifierMixin, BaseQuantifier):
...     def __init__(self, quantifier=None):
...         self.quantifier = quantifier
...     def fit(self, X, y):
...         if self.quantifier is not None:
...             self.quantifier.fit(X, y)
...         else:
...             self.quantifier = CC()
...         return self
>>> X = np.random.rand(100, 10)
>>> y = np.random.randint(0, 2, size=100)
>>> meta_qtf = MyMetaQuantifier().fit(X, y)
>>> meta_qtf.quantifier
CC()
class mlquantify.base.ProtocolMixin[source]#

Bases: object

Mixin class for protocol-based quantifiers.

This mixin indicates that the quantifier follows a specific protocol, by setting the estimation_type tag to “sample” and requires_fit to False.

Examples

>>> from mlquantify.base import BaseQuantifier, ProtocolMixin
>>> class MyProtocolQuantifier(ProtocolMixin, BaseQuantifier):
...     def __init__(self, param=None):
...         self.param = param
...     def sample_method(self, X):
...         indexes = np.random.choice(len(X), size=10, replace=False)
...         X_sample = X[indexes]
...         return X_sample
>>> X = np.random.rand(100, 10)
>>> protocol_qtf = MyProtocolQuantifier(param=5)
>>> X_sample = protocol_qtf.sample_method(X)
>>> X_sample.shape
(10, 10)

mlquantify.base_aggregative module#

class mlquantify.base_aggregative.AggregationMixin[source]#

Bases: object

Mixin class for all aggregative quantifiers.

An aggregative quantifier is a quantifier that relies on an underlying supervised learner to produce predictions on which the quantification is then performed.

Inheriting from this mixin provides learner validation and setting parameters also for the learner (used by GridSearchQ and friends).

This mixin also sets the has_estimator and requires_fit tags to True.

Notes

  • An aggregative quantifier must have a ‘learner’ attribute that is a supervised learning estimator.

  • Depending on the type of predictions required from the learner, the quantifier can be further classified as a ‘soft’ or ‘crisp’ aggregative quantifier.

Read more in the User Guide for more details.

Examples

>>> from mlquantify.base import BaseQuantifier, AggregationMixin
>>> from sklearn.linear_model import LogisticRegression
>>> import numpy as np
>>> class MyAggregativeQuantifier(AggregationMixin, BaseQuantifier):
...     def __init__(self, learner=None):
...         self.learner = learner if learner is not None else LogisticRegression()
...     def fit(self, X, y):
...         self.learner.fit(X, y)
...         self.classes_ = np.unique(y)
...         return self
...     def predict(self, X):
...         preds = self.learner.predict(X)
...         _, counts = np.unique(preds, return_counts=True)
...         prevalence = counts / counts.sum()
...         return prevalence
>>> quantifier = MyAggregativeQuantifier()
>>> X = np.random.rand(100, 10)
>>> y = np.random.randint(0, 2, size=100)
>>> quantifier.fit(X, y).predict(X)
[0.5 0.5]
set_params(**params)[source]#
class mlquantify.base_aggregative.CrispLearnerQMixin[source]#

Bases: object

Crisp predictions mixin for aggregative quantifiers.

This mixin provides the following change tags: - estimator_function: “predict” - estimator_type: “crisp”

Notes

  • This mixin should be used alongside the AggregationMixin, in

the left of it in the inheritance order.

Examples

>>> from mlquantify.base import BaseQuantifier, AggregationMixin, CrispLearnerQMixin
>>> from sklearn.linear_model import LogisticRegression
>>> import numpy as np
>>> class MyCrispAggregativeQuantifier(CrispLearnerQMixin, AggregationMixin, BaseQuantifier):
...     def __init__(self, learner=None):
...         self.learner = learner if learner is not None else LogisticRegression()
...     def fit(self, X, y):
...         self.learner.fit(X, y)
...         self.classes_ = np.unique(y)
...         return self
...     def predict(self, X):
...         preds = self.learner.predict(X)
...         _, counts = np.unique(preds, return_counts=True)
...         prevalence = counts / counts.sum()
...         return prevalence
>>> quantifier = MyCrispAggregativeQuantifier()
>>> X = np.random.rand(100, 10)
>>> y = np.random.randint(0, 2, size=100)
>>> quantifier.fit(X, y).predict(X)
[0.5 0.5]
class mlquantify.base_aggregative.SoftLearnerQMixin[source]#

Bases: object

Soft predictions mixin for aggregative quantifiers.

This mixin provides the following change tags: - estimator_function: “predict_proba” - estimator_type: “soft”

Notes

  • This mixin should be used alongside the AggregationMixin, in

the left of it in the inheritance order.

Examples

>>> from mlquantify.base import BaseQuantifier, AggregationMixin, SoftLearnerQMixin
>>> from sklearn.linear_model import LogisticRegression
>>> import numpy as np
>>> class MySoftAggregativeQuantifier(SoftLearnerQMixin, AggregationMixin, BaseQuantifier):
...     def __init__(self, learner=None):
...         self.learner = learner if learner is not None else LogisticRegression()
...     def fit(self, X, y):
...         self.learner.fit(X, y)
...         self.classes_ = np.unique(y)
...         return self
...     def predict(self, X):
...         proba = self.learner.predict_proba(X)
...         return proba.mean(axis=0)
>>> quantifier = MySoftAggregativeQuantifier()
>>> X = np.random.rand(100, 10)
>>> y = np.random.randint(0, 2, size=100)
>>> quantifier.fit(X, y).predict(X)
[0.5 0.5]
mlquantify.base_aggregative.get_aggregation_requirements(quantifier)[source]#

Get the prediction requirements for the aggregative quantifier.

mlquantify.base_aggregative.is_aggregative_quantifier(quantifier)[source]#

Check if the quantifier is aggregative.

mlquantify.base_aggregative.uses_crisp_predictions(quantifier)[source]#

Check if the quantifier uses crisp predictions.

mlquantify.base_aggregative.uses_soft_predictions(quantifier)[source]#

Check if the quantifier uses soft predictions.

mlquantify.calibration module#

mlquantify.confidence module#

class mlquantify.confidence.BaseConfidenceRegion(prev_estims, confidence_level=0.95)[source]#

Bases: object

Base class for confidence regions of prevalence estimates.

This class defines the interface and core structure for constructing confidence regions around class prevalence estimates obtained from quantification models.

Confidence regions capture the uncertainty associated with prevalence estimates, typically derived from bootstrap resampling as proposed in [1].

Parameters:
prev_estimsarray-like of shape (m, n)

Collection of m bootstrap prevalence estimates for n classes.

confidence_levelfloat, default=0.95

Desired confidence level \(1 - \alpha\) of the region.

Attributes:
prev_estimsndarray of shape (m, n)

Bootstrap prevalence estimates.

confidence_levelfloat

Confidence level associated with the region.

Notes

The confidence region \(CR_{\alpha}\) is defined such that

\[\mathbb{P}\left(\pi^{\ast} \in CR_{\alpha}\right) = 1 - \alpha\]

where \(\pi^{\ast}\) is the unknown true class-prevalence vector.

References

[1]

Moreo, A., & Salvati, N. (2025). An Efficient Method for Deriving Confidence Intervals in Aggregative Quantification. Istituto di Scienza e Tecnologie dell’Informazione, CNR, Pisa.

Examples

>>> import numpy as np
>>> class DummyRegion(BaseConfidenceRegion):
...     def _compute_region(self):
...         self.mean_ = np.mean(self.prev_estims, axis=0)
...     def get_region(self):
...         return self.mean_
...     def get_point_estimate(self):
...         return self.mean_
...     def contains(self, point):
...         return np.allclose(point, self.mean_, atol=0.1)
>>> X = np.random.dirichlet(np.ones(3), size=100)
>>> region = DummyRegion(X, confidence_level=0.9)
>>> region.get_point_estimate().round(3)
array([0.33, 0.33, 0.34])
contains(point)[source]#

Check whether a prevalence vector lies within the region.

get_point_estimate()[source]#

Return the point estimate of prevalence (e.g., mean of bootstrap samples).

get_region()[source]#

Return the parameters defining the confidence region.

class mlquantify.confidence.ConfidenceEllipseCLR(prev_estims, confidence_level=0.95)[source]#

Bases: ConfidenceEllipseSimplex

Confidence ellipse for prevalence estimates in CLR-transformed space.

Applies the Centered Log-Ratio (CLR) transformation:

\[\begin{split}T(π) = [\log(π_1/g(π)), ..., \log(π_n/g(π))], \\ g(π) = (\prod_i π_i)^{1/n}\end{split}\]

A confidence ellipse is then built in the transformed space:

\[\begin{split}CT_α(π) = \\begin{cases} 1 & \\text{if } (T(π) - μ_{CLR})^T Σ^{-1} (T(π) - μ_{CLR}) \\le χ^2_{n-1}(1-α) \\\\ 0 & \\text{otherwise} \\end{cases}\end{split}\]
Parameters:
prev_estimsarray-like of shape (m, n)

Bootstrap prevalence estimates.

confidence_levelfloat, default=0.95

Confidence level.

Attributes:
mean_ndarray of shape (n,)

Mean vector in CLR space.

precision_matrixndarray of shape (n, n)

Inverse covariance matrix in CLR space.

chi2_valfloat

Chi-squared threshold.

References

[1] Moreo, A., & Salvati, N. (2025).

An Efficient Method for Deriving Confidence Intervals in Aggregative Quantification. Section 3.3, Equation (3).

Examples

>>> X = np.random.dirichlet(np.ones(3), size=200)
>>> clr = ConfidenceEllipseCLR(X, confidence_level=0.9)
>>> clr.get_point_estimate().round(3)
array([ 0.,  0., -0.])
>>> clr.contains(np.array([0.4, 0.4, 0.2]))
True
contains(point, eps=1e-06)[source]#

Check whether a prevalence vector lies within the region.

get_point_estimate()[source]#

Return the point estimate of prevalence (e.g., mean of bootstrap samples).

class mlquantify.confidence.ConfidenceEllipseSimplex(prev_estims, confidence_level=0.95)[source]#

Bases: BaseConfidenceRegion

Confidence ellipse for prevalence estimates in the simplex.

Defines a multivariate confidence region based on a chi-squared threshold:

\[\begin{split}CE_α(π) = \\begin{cases} 1 & \\text{if } (π - μ)^T Σ^{-1} (π - μ) \\le χ^2_{n-1}(1-α) \\\\ 0 & \\text{otherwise} \\end{cases}\end{split}\]
Parameters:
prev_estimsarray-like of shape (m, n)

Bootstrap prevalence estimates.

confidence_levelfloat, default=0.95

Confidence level.

Attributes:
mean_ndarray of shape (n,)

Mean prevalence estimate.

precision_matrixndarray of shape (n, n)

Inverse covariance matrix of estimates.

chi2_valfloat

Chi-squared cutoff threshold defining the ellipse.

References

[1] Moreo, A., & Salvati, N. (2025).

An Efficient Method for Deriving Confidence Intervals in Aggregative Quantification. Section 3.3, Equation (2).

Examples

>>> X = np.random.dirichlet(np.ones(3), size=200)
>>> ce = ConfidenceEllipseSimplex(X, confidence_level=0.95)
>>> ce.get_point_estimate().round(3)
array([0.33, 0.34, 0.33])
>>> ce.contains(np.array([0.4, 0.3, 0.3]))
True
contains(point)[source]#

Check whether a prevalence vector lies within the region.

get_point_estimate()[source]#

Return the point estimate of prevalence (e.g., mean of bootstrap samples).

get_region()[source]#

Return the parameters defining the confidence region.

class mlquantify.confidence.ConfidenceInterval(prev_estims, confidence_level=0.95)[source]#

Bases: BaseConfidenceRegion

Bootstrap confidence intervals for each class prevalence.

Constructs independent percentile-based confidence intervals for each class dimension from bootstrap samples.

The confidence region is defined as:

\[\begin{split}CI_α(π) = \\begin{cases} 1 & \\text{if } L_i \\le π_i \\le U_i, \\forall i=1,...,n \\\\ 0 & \\text{otherwise} \\end{cases}\end{split}\]

where \(L_i\) and \(U_i\) are the empirical α/2 and 1−α/2 quantiles for class i.

Parameters:
prev_estimsarray-like of shape (m, n)

Bootstrap prevalence estimates.

confidence_levelfloat, default=0.95

Desired confidence level.

Attributes:
I_lowndarray of shape (n,)

Lower confidence bounds.

I_highndarray of shape (n,)

Upper confidence bounds.

References

[1] Moreo, A., & Salvati, N. (2025).

An Efficient Method for Deriving Confidence Intervals in Aggregative Quantification. Section 3.3, Equation (1).

Examples

>>> X = np.random.dirichlet(np.ones(3), size=200)
>>> ci = ConfidenceInterval(X, confidence_level=0.9)
>>> ci.get_region()
(array([0.05, 0.06, 0.05]), array([0.48, 0.50, 0.48]))
>>> ci.contains([0.3, 0.4, 0.3])
array([[ True]])
contains(point)[source]#

Check whether a prevalence vector lies within the region.

get_point_estimate()[source]#

Return the point estimate of prevalence (e.g., mean of bootstrap samples).

get_region()[source]#

Return the parameters defining the confidence region.

mlquantify.confidence.construct_confidence_region(prev_estims, confidence_level=0.95, method='intervals')[source]#

mlquantify.multiclass module#

class mlquantify.multiclass.BinaryQuantifier[source]#

Bases: MetaquantifierMixin, BaseQuantifier

Meta-quantifier enabling One-vs-Rest and One-vs-One strategies.

This class extends a base quantifier to handle multiclass problems by decomposing them into binary subproblems. It automatically delegates fitting, prediction, and aggregation operations to the appropriate binary quantifiers.

Attributes:
qtfs_dict

Dictionary mapping class labels or label pairs to fitted binary quantifiers.

strategy{‘ovr’, ‘ovo’}

Defines how multiclass quantification is decomposed.

aggregate(*args)[source]#

Aggregate binary predictions to obtain multiclass prevalence estimates.

fit(X, y)[source]#

Fit the quantifier under a binary decomposition strategy.

predict(X)[source]#

Predict class prevalences using the trained binary quantifiers.

mlquantify.multiclass.define_binary(cls)[source]#

Decorator to enable binary quantification extensions (One-vs-Rest or One-vs-One).

This decorator dynamically extends a quantifier class to handle multiclass quantification tasks by decomposing them into multiple binary subproblems, following either the One-vs-Rest (OvR) or One-vs-One (OvO) strategy.

It automatically replaces the class methods fit, predict, and aggregate with binary-aware versions from BinaryQuantifier, while preserving access to the original implementations via _original_fit, _original_predict, and _original_aggregate.

Parameters:
clsclass

A subclass of BaseQuantifier implementing standard binary quantification methods (fit, predict, and aggregate).

Returns:
class

The same class with binary quantification capabilities added.

Examples

>>> from mlquantify.base import BaseQuantifier
>>> from mlquantify.binary import define_binary
>>> @define_binary
... class MyQuantifier(BaseQuantifier):
...     def fit(self, X, y):
...         # Custom binary training logic
...         self.classes_ = np.unique(y)
...         return self
...
...     def predict(self, X):
...         # Return dummy prevalences
...         return np.array([0.4, 0.6])
...
...     def aggregate(self, preds, y_train):
...         # Example aggregation method
...         return np.mean(preds, axis=0)
>>> qtf = MyQuantifier()
>>> qtf.strategy = 'ovr'  # or 'ovo'
>>> X = np.random.randn(10, 5)
>>> y = np.random.randint(0, 3, 10)
>>> qtf.fit(X, y)
MyQuantifier(...)
>>> qtf.predict(X)
array([...])

Module contents#

mlquantify, a Python package for quantification