mlquantify package#
Subpackages#
Submodules#
mlquantify.base module#
- class mlquantify.base.BaseQuantifier[source]#
Bases:
ABC,BaseEstimatorBase class for all quantifiers in mlquantify.
Inhering from this class provides default implementations for
setting and getting parameters used by
GridSearchQand friends;saving/loading quantifier instances;
parameter validation.
Read more in User Guide.
Notes
All quantifiers should specify all the parameters that can be set at the class level in their
__init__as explicit keyword arguments. (No*argsor**kwargsallowed.)Examples
>>> from mlquantify.base import BaseQuantifier >>> import numpy as np >>> class MyQuantifier(BaseQuantifier): ... def __init__(self, param1=42, param2='default'): ... self.param1 = param1 ... self.param2 = param2 ... def fit(self, X, y): ... self.classes_ = np.unique(y) ... return self ... def predict(self, X): ... _, counts = np.unique(self.classes_, return_counts=True) ... prevalence = counts / counts.sum() ... return prevalence >>> quantifier = MyQuantifier(param1=10, param2='custom') >>> quantifier.get_params() {'param1': 10, 'param2': 'custom'} >>> X = np.random.rand(100, 10) >>> y = np.random.randint(0, 2, size=100) >>> quantifier.fit(X, y).predict(X) [0.5 0.5]
- class mlquantify.base.MetaquantifierMixin[source]#
Bases:
objectMixin class for meta-quantifiers.
This mixin is empty, and only exists to indicate that the quantifier is a meta quantifier
Examples
>>> from mlquantify.base import BaseQuantifier, MetaquantifierMixin >>> from mlquantify.adjust_counting import CC >>> class MyMetaQuantifier(MetaquantifierMixin, BaseQuantifier): ... def __init__(self, quantifier=None): ... self.quantifier = quantifier ... def fit(self, X, y): ... if self.quantifier is not None: ... self.quantifier.fit(X, y) ... else: ... self.quantifier = CC() ... return self >>> X = np.random.rand(100, 10) >>> y = np.random.randint(0, 2, size=100) >>> meta_qtf = MyMetaQuantifier().fit(X, y) >>> meta_qtf.quantifier CC()
- class mlquantify.base.ProtocolMixin[source]#
Bases:
objectMixin class for protocol-based quantifiers.
This mixin indicates that the quantifier follows a specific protocol, by setting the estimation_type tag to “sample” and requires_fit to False.
Examples
>>> from mlquantify.base import BaseQuantifier, ProtocolMixin >>> class MyProtocolQuantifier(ProtocolMixin, BaseQuantifier): ... def __init__(self, param=None): ... self.param = param ... def sample_method(self, X): ... indexes = np.random.choice(len(X), size=10, replace=False) ... X_sample = X[indexes] ... return X_sample >>> X = np.random.rand(100, 10) >>> protocol_qtf = MyProtocolQuantifier(param=5) >>> X_sample = protocol_qtf.sample_method(X) >>> X_sample.shape (10, 10)
mlquantify.base_aggregative module#
- class mlquantify.base_aggregative.AggregationMixin[source]#
Bases:
objectMixin class for all aggregative quantifiers.
An aggregative quantifier is a quantifier that relies on an underlying supervised learner to produce predictions on which the quantification is then performed.
Inheriting from this mixin provides learner validation and setting parameters also for the learner (used by
GridSearchQand friends).This mixin also sets the
has_estimatorandrequires_fittags toTrue.Notes
An aggregative quantifier must have a ‘learner’ attribute that is a supervised learning estimator.
Depending on the type of predictions required from the learner, the quantifier can be further classified as a ‘soft’ or ‘crisp’ aggregative quantifier.
Read more in the User Guide for more details.
Examples
>>> from mlquantify.base import BaseQuantifier, AggregationMixin >>> from sklearn.linear_model import LogisticRegression >>> import numpy as np >>> class MyAggregativeQuantifier(AggregationMixin, BaseQuantifier): ... def __init__(self, learner=None): ... self.learner = learner if learner is not None else LogisticRegression() ... def fit(self, X, y): ... self.learner.fit(X, y) ... self.classes_ = np.unique(y) ... return self ... def predict(self, X): ... preds = self.learner.predict(X) ... _, counts = np.unique(preds, return_counts=True) ... prevalence = counts / counts.sum() ... return prevalence >>> quantifier = MyAggregativeQuantifier() >>> X = np.random.rand(100, 10) >>> y = np.random.randint(0, 2, size=100) >>> quantifier.fit(X, y).predict(X) [0.5 0.5]
- class mlquantify.base_aggregative.CrispLearnerQMixin[source]#
Bases:
objectCrisp predictions mixin for aggregative quantifiers.
This mixin provides the following change tags: -
estimator_function: “predict” -estimator_type: “crisp”Notes
This mixin should be used alongside the
AggregationMixin, in
the left of it in the inheritance order.
Examples
>>> from mlquantify.base import BaseQuantifier, AggregationMixin, CrispLearnerQMixin >>> from sklearn.linear_model import LogisticRegression >>> import numpy as np >>> class MyCrispAggregativeQuantifier(CrispLearnerQMixin, AggregationMixin, BaseQuantifier): ... def __init__(self, learner=None): ... self.learner = learner if learner is not None else LogisticRegression() ... def fit(self, X, y): ... self.learner.fit(X, y) ... self.classes_ = np.unique(y) ... return self ... def predict(self, X): ... preds = self.learner.predict(X) ... _, counts = np.unique(preds, return_counts=True) ... prevalence = counts / counts.sum() ... return prevalence >>> quantifier = MyCrispAggregativeQuantifier() >>> X = np.random.rand(100, 10) >>> y = np.random.randint(0, 2, size=100) >>> quantifier.fit(X, y).predict(X) [0.5 0.5]
- class mlquantify.base_aggregative.SoftLearnerQMixin[source]#
Bases:
objectSoft predictions mixin for aggregative quantifiers.
This mixin provides the following change tags: -
estimator_function: “predict_proba” -estimator_type: “soft”Notes
This mixin should be used alongside the
AggregationMixin, in
the left of it in the inheritance order.
Examples
>>> from mlquantify.base import BaseQuantifier, AggregationMixin, SoftLearnerQMixin >>> from sklearn.linear_model import LogisticRegression >>> import numpy as np >>> class MySoftAggregativeQuantifier(SoftLearnerQMixin, AggregationMixin, BaseQuantifier): ... def __init__(self, learner=None): ... self.learner = learner if learner is not None else LogisticRegression() ... def fit(self, X, y): ... self.learner.fit(X, y) ... self.classes_ = np.unique(y) ... return self ... def predict(self, X): ... proba = self.learner.predict_proba(X) ... return proba.mean(axis=0) >>> quantifier = MySoftAggregativeQuantifier() >>> X = np.random.rand(100, 10) >>> y = np.random.randint(0, 2, size=100) >>> quantifier.fit(X, y).predict(X) [0.5 0.5]
- mlquantify.base_aggregative.get_aggregation_requirements(quantifier)[source]#
Get the prediction requirements for the aggregative quantifier.
- mlquantify.base_aggregative.is_aggregative_quantifier(quantifier)[source]#
Check if the quantifier is aggregative.
mlquantify.calibration module#
mlquantify.confidence module#
- class mlquantify.confidence.BaseConfidenceRegion(prev_estims, confidence_level=0.95)[source]#
Bases:
objectBase class for confidence regions of prevalence estimates.
This class defines the interface and core structure for constructing confidence regions around class prevalence estimates obtained from quantification models.
Confidence regions capture the uncertainty associated with prevalence estimates, typically derived from bootstrap resampling as proposed in [1].
- Parameters:
- prev_estimsarray-like of shape (m, n)
Collection of
mbootstrap prevalence estimates fornclasses.- confidence_levelfloat, default=0.95
Desired confidence level \(1 - \alpha\) of the region.
- Attributes:
- prev_estimsndarray of shape (m, n)
Bootstrap prevalence estimates.
- confidence_levelfloat
Confidence level associated with the region.
Notes
The confidence region \(CR_{\alpha}\) is defined such that
\[\mathbb{P}\left(\pi^{\ast} \in CR_{\alpha}\right) = 1 - \alpha\]where \(\pi^{\ast}\) is the unknown true class-prevalence vector.
References
[1]Moreo, A., & Salvati, N. (2025). An Efficient Method for Deriving Confidence Intervals in Aggregative Quantification. Istituto di Scienza e Tecnologie dell’Informazione, CNR, Pisa.
Examples
>>> import numpy as np >>> class DummyRegion(BaseConfidenceRegion): ... def _compute_region(self): ... self.mean_ = np.mean(self.prev_estims, axis=0) ... def get_region(self): ... return self.mean_ ... def get_point_estimate(self): ... return self.mean_ ... def contains(self, point): ... return np.allclose(point, self.mean_, atol=0.1) >>> X = np.random.dirichlet(np.ones(3), size=100) >>> region = DummyRegion(X, confidence_level=0.9) >>> region.get_point_estimate().round(3) array([0.33, 0.33, 0.34])
- class mlquantify.confidence.ConfidenceEllipseCLR(prev_estims, confidence_level=0.95)[source]#
Bases:
ConfidenceEllipseSimplexConfidence ellipse for prevalence estimates in CLR-transformed space.
Applies the Centered Log-Ratio (CLR) transformation:
\[\begin{split}T(π) = [\log(π_1/g(π)), ..., \log(π_n/g(π))], \\ g(π) = (\prod_i π_i)^{1/n}\end{split}\]A confidence ellipse is then built in the transformed space:
\[\begin{split}CT_α(π) = \\begin{cases} 1 & \\text{if } (T(π) - μ_{CLR})^T Σ^{-1} (T(π) - μ_{CLR}) \\le χ^2_{n-1}(1-α) \\\\ 0 & \\text{otherwise} \\end{cases}\end{split}\]- Parameters:
- prev_estimsarray-like of shape (m, n)
Bootstrap prevalence estimates.
- confidence_levelfloat, default=0.95
Confidence level.
- Attributes:
- mean_ndarray of shape (n,)
Mean vector in CLR space.
- precision_matrixndarray of shape (n, n)
Inverse covariance matrix in CLR space.
- chi2_valfloat
Chi-squared threshold.
References
- [1] Moreo, A., & Salvati, N. (2025).
An Efficient Method for Deriving Confidence Intervals in Aggregative Quantification. Section 3.3, Equation (3).
Examples
>>> X = np.random.dirichlet(np.ones(3), size=200) >>> clr = ConfidenceEllipseCLR(X, confidence_level=0.9) >>> clr.get_point_estimate().round(3) array([ 0., 0., -0.]) >>> clr.contains(np.array([0.4, 0.4, 0.2])) True
- class mlquantify.confidence.ConfidenceEllipseSimplex(prev_estims, confidence_level=0.95)[source]#
Bases:
BaseConfidenceRegionConfidence ellipse for prevalence estimates in the simplex.
Defines a multivariate confidence region based on a chi-squared threshold:
\[\begin{split}CE_α(π) = \\begin{cases} 1 & \\text{if } (π - μ)^T Σ^{-1} (π - μ) \\le χ^2_{n-1}(1-α) \\\\ 0 & \\text{otherwise} \\end{cases}\end{split}\]- Parameters:
- prev_estimsarray-like of shape (m, n)
Bootstrap prevalence estimates.
- confidence_levelfloat, default=0.95
Confidence level.
- Attributes:
- mean_ndarray of shape (n,)
Mean prevalence estimate.
- precision_matrixndarray of shape (n, n)
Inverse covariance matrix of estimates.
- chi2_valfloat
Chi-squared cutoff threshold defining the ellipse.
References
- [1] Moreo, A., & Salvati, N. (2025).
An Efficient Method for Deriving Confidence Intervals in Aggregative Quantification. Section 3.3, Equation (2).
Examples
>>> X = np.random.dirichlet(np.ones(3), size=200) >>> ce = ConfidenceEllipseSimplex(X, confidence_level=0.95) >>> ce.get_point_estimate().round(3) array([0.33, 0.34, 0.33]) >>> ce.contains(np.array([0.4, 0.3, 0.3])) True
- class mlquantify.confidence.ConfidenceInterval(prev_estims, confidence_level=0.95)[source]#
Bases:
BaseConfidenceRegionBootstrap confidence intervals for each class prevalence.
Constructs independent percentile-based confidence intervals for each class dimension from bootstrap samples.
The confidence region is defined as:
\[\begin{split}CI_α(π) = \\begin{cases} 1 & \\text{if } L_i \\le π_i \\le U_i, \\forall i=1,...,n \\\\ 0 & \\text{otherwise} \\end{cases}\end{split}\]where \(L_i\) and \(U_i\) are the empirical α/2 and 1−α/2 quantiles for class i.
- Parameters:
- prev_estimsarray-like of shape (m, n)
Bootstrap prevalence estimates.
- confidence_levelfloat, default=0.95
Desired confidence level.
- Attributes:
- I_lowndarray of shape (n,)
Lower confidence bounds.
- I_highndarray of shape (n,)
Upper confidence bounds.
References
- [1] Moreo, A., & Salvati, N. (2025).
An Efficient Method for Deriving Confidence Intervals in Aggregative Quantification. Section 3.3, Equation (1).
Examples
>>> X = np.random.dirichlet(np.ones(3), size=200) >>> ci = ConfidenceInterval(X, confidence_level=0.9) >>> ci.get_region() (array([0.05, 0.06, 0.05]), array([0.48, 0.50, 0.48])) >>> ci.contains([0.3, 0.4, 0.3]) array([[ True]])
mlquantify.multiclass module#
- class mlquantify.multiclass.BinaryQuantifier[source]#
Bases:
MetaquantifierMixin,BaseQuantifierMeta-quantifier enabling One-vs-Rest and One-vs-One strategies.
This class extends a base quantifier to handle multiclass problems by decomposing them into binary subproblems. It automatically delegates fitting, prediction, and aggregation operations to the appropriate binary quantifiers.
- Attributes:
- qtfs_dict
Dictionary mapping class labels or label pairs to fitted binary quantifiers.
- strategy{‘ovr’, ‘ovo’}
Defines how multiclass quantification is decomposed.
- mlquantify.multiclass.define_binary(cls)[source]#
Decorator to enable binary quantification extensions (One-vs-Rest or One-vs-One).
This decorator dynamically extends a quantifier class to handle multiclass quantification tasks by decomposing them into multiple binary subproblems, following either the One-vs-Rest (OvR) or One-vs-One (OvO) strategy.
It automatically replaces the class methods
fit,predict, andaggregatewith binary-aware versions fromBinaryQuantifier, while preserving access to the original implementations via_original_fit,_original_predict, and_original_aggregate.- Parameters:
- clsclass
A subclass of
BaseQuantifierimplementing standard binary quantification methods (fit,predict, andaggregate).
- Returns:
- class
The same class with binary quantification capabilities added.
Examples
>>> from mlquantify.base import BaseQuantifier >>> from mlquantify.binary import define_binary
>>> @define_binary ... class MyQuantifier(BaseQuantifier): ... def fit(self, X, y): ... # Custom binary training logic ... self.classes_ = np.unique(y) ... return self ... ... def predict(self, X): ... # Return dummy prevalences ... return np.array([0.4, 0.6]) ... ... def aggregate(self, preds, y_train): ... # Example aggregation method ... return np.mean(preds, axis=0)
>>> qtf = MyQuantifier() >>> qtf.strategy = 'ovr' # or 'ovo' >>> X = np.random.randn(10, 5) >>> y = np.random.randint(0, 3, 10) >>> qtf.fit(X, y) MyQuantifier(...) >>> qtf.predict(X) array([...])
Module contents#
mlquantify, a Python package for quantification