API Reference#

This is the class and function reference of mlquantify. Please refer to the full user guide for further details, as the raw specifications of classes and functions may not be enough to give full guidelines on their use. For reference on core concepts, see the Foundations guide.

Object

Description

get_config

Retrieve the current mlquantify configuration.

set_config

Set global mlquantify configuration.

config_context

Context manager to temporarily change the global mlquantify configuration.

BaseQuantifier

Base class for all quantifiers in mlquantify.

MetaquantifierMixin

Mixin class for meta-quantifiers.

ProtocolMixin

Mixin class for protocol-based quantifiers.

AggregationMixin

Mixin class for all aggregative quantifiers.

SoftPredictionMixin

Soft predictions mixin for aggregative quantifiers.

CrispPredictionMixin

Crisp predictions mixin for aggregative quantifiers.

Calibrator

Base class for calibrators.

ClassifierCalibrator

Post-hoc calibration of classifier posteriors by logit scaling.

QuantifierCalibrator

Post-hoc calibration of quantifier prevalence estimates.

BaseComposeQuantifier

Base class for compose-based quantifiers.

LinearComposeQuantifier

Compose quantifier for linear representation matching.

LikelihoodComposeQuantifier

Compose quantifier based on mixture negative log-likelihood.

ComposeQuantifier

Compose quantifier for linear representation matching.

BaseConfidenceRegion

Base class for confidence regions of prevalence estimates.

ConfidenceInterval

Bootstrap confidence intervals for each class prevalence.

ConfidenceEllipseSimplex

Confidence ellipse for prevalence estimates in the simplex.

ConfidenceEllipseCLR

Confidence ellipse for prevalence estimates in CLR-transformed space.

construct_confidence_region

Instantiate a confidence region from bootstrap prevalence estimates.

CC

Classify and Count (CC) quantifier.

PCC

Probabilistic Classify and Count (PCC) quantifier.

ACC

Adjusted Classify and Count (ACC) quantifier.

ThresholdAdjustment

Abstract base class for ROC-threshold adjustment quantifiers.

TAC

Threshold Adjusted Count (TAC) quantifier.

TX

Threshold X (TX) quantifier.

TMAX

Threshold MAX (TMAX) quantifier.

T50

Threshold 50 (T50) quantifier.

MS

Median Sweep (MS) quantifier.

MS2

Median Sweep 2 (MS2) quantifier.

FM

Friedman Method (FM) quantifier.

GACC

Generalized Adjusted Classify and Count (GACC).

GPACC

Generalized Probabilistic Adjusted Classify and Count (GPACC).

evaluate_thresholds

Evaluate a range of classification thresholds to compute the corresponding True Positive Rate (TPR) and False Positive Rate (FPR) for a binary quantification task.

compute_tpr

Compute the True Positive Rate (Recall) for a binary classification task.

compute_fpr

Compute the False Positive Rate for a binary classification task.

compute_table

Compute the confusion matrix table for a binary classification task.

make_quantification

Generate synthetic quantification bags under prior-probability shift.

fetch_mushroom

Mushroom: edible vs. poisonous (binary, all-categorical).

fetch_banknote_authentication

Banknote authentication from wavelet image features (binary).

fetch_haberman_survival

Haberman survival after breast-cancer surgery (binary, hard).

fetch_miniboone

MiniBooNE particle identification: signal vs. background (binary, large).

fetch_digits_optical_penbased

Optical / Pen-based handwritten digits (10-class, easy).

fetch_dry_bean

Dry Bean: seven bean varieties from grain morphology (multiclass).

fetch_covertype

Forest Covertype: 7 cover types from cartographic variables (multiclass, large).

fetch_yeast

Yeast protein localization site (10-class, hard, imbalanced).

fetch_sensorless_drive

Sensorless drive diagnosis from motor current signals (11-class, balanced).

fetch_statlog_shuttle

Statlog (Shuttle): space-shuttle radiator states (multiclass, extreme imbalance).

fetch_wine_quality

Wine Quality: sensory score 3-9 from physicochemistry (ORDINAL).

fetch_online_news_popularity

Online News Popularity: will an article be popular? (binary, temporal).

fetch_pima_diabetes

Pima Indians Diabetes (binary, hard, noisy medical).

fetch_electricity_elec2

Electricity (Elec2): NSW market price up/down stream (binary, drift).

fetch_airlines

Airlines: flight-delay stream (binary, large, temporal).

fetch_newsgroups20

20 Newsgroups: Usenet posts in 20 topics (text, multiclass).

fetch_imdb

IMDB Large Movie Review sentiment (text, binary, balanced).

fetch_multidomain_sentiment

Multi-Domain (Blitzer) Amazon review sentiment (text, covariate shift).

fetch_sentiment140

Sentiment140: 1.6M timestamped tweets (text, binary, temporal).

fetch_rcv1_v2

RCV1-v2: Reuters news topics (text, sparse TF-IDF, multilabel).

fetch_mnist_usps

MNIST -> USPS handwritten digits (image, covariate shift).

fetch_cifar10

CIFAR-10 natural images (image, 10-class, balanced).

fetch_planetoid_cora_citeseer_pubmed

Planetoid citation graphs: Cora / CiteSeer / PubMed (graph nodes).

fetch_sea_concepts

SEA Concepts: synthetic stream with abrupt concept drift (binary).

fetch_lequa2024

LeQua 2024 competition vectors, all tasks via task (text/ordinal).

Bunch

get_data_home

fetch_remote

urllib download with local cache + retries (like sklearn). Retries once unverified on TLS errors.

CDE

CDE-Iterate quantifier.

EMQ

Expectation-Maximization Quantifier (EMQ / SLD).

MLPE

Maximum Likelihood Prevalence Estimation (MLPE) quantifier.

BaseLoss

Base class for optimization losses.

DistanceLoss

Generic distance-based loss between two probability distributions.

LeastSquaresLoss

Squared Euclidean (least-squares) loss.

HellingerSurrogateLoss

Optimization surrogate for the squared Hellinger distance.

EnergyLoss

Energy-distance loss for distribution matching.

NegativeLogLikelihoodLoss

Negative log-likelihood loss for mixture likelihoods.

MixtureNegativeLogLikelihoodLoss

Negative log-likelihood for class likelihood mixtures.

RegularizedMixtureNLLLoss

Mixture NLL with optional ordinal-smoothness regularization.

normalize_distribution

Normalize an array to a valid probability distribution.

get_loss

Instantiate a loss object from a string identifier or return a callable.

BaseMatchingQuantifier

Base class for distribution matching quantifiers.

MatchingHistogramQuantifier

Abstract base class for histogram-based distribution matching.

DyS

Distribution y-Similarity (DyS) quantifier.

HDy

Hellinger Distance y (HDy) quantifier.

HDx

Hellinger Distance x (HDx) quantifier.

SORD

Sample Ordinal Distance (SORD) quantifier.

MatchingKernelQuantifier

Abstract base class for kernel mean matching quantifiers.

MMD_RKHS

Maximum Mean Discrepancy in RKHS (MMD-RKHS) quantifier.

KDEyQuantifier

Abstract base class for KDE-based density matching quantifiers.

KDEyML

KDEy Maximum Likelihood (KDEy-ML) quantifier.

KDEyHD

KDEy Hellinger Distance (KDEy-HD) quantifier.

KDEyCS

KDEy Cauchy-Schwarz (KDEy-CS) quantifier.

GKDEyML

Generalized KDEy Maximum Likelihood (GKDEyML) quantifier.

GHDx

Generalized HDx (GHDx) quantifier.

GHDy

Generalized HDy (GHDy) quantifier.

SMM

Sample Mean Matching (SMM) quantifier.

EDy

Energy Distance y (EDy) quantifier.

EDx

Energy Distance x (EDx) quantifier.

EnsembleQ

Ensemble Quantifier with prevalence-controlled diversity.

QuaDapt

QuaDapt: drift-resilient quantification via parameter adaptation.

AggregativeBootstrap

Aggregative Bootstrap quantifier for prevalence confidence regions.

AE

Compute the absolute error for each class or a dictionary of errors if input is a dictionary.

SE

Compute the mean squared error between the real and predicted prevalences.

MAE

Compute the mean absolute error between the real and predicted prevalences.

MSE

Mean Squared Error

KLD

Compute the Kullback-Leibler divergence between the real and predicted prevalences.

RAE

Compute the relative absolute error between the real and predicted prevalences.

NAE

Compute the normalized absolute error between the real and predicted prevalences.

NRAE

Compute the normalized relative absolute error between the real and predicted prevalences.

NKLD

Compute the normalized Kullback-Leibler divergence between the real and predicted prevalences.

NMD

Compute the Normalized Match Distance (NMD), also known as Earth Mover’s Distance (EMD), for ordinal quantification evaluation.

RNOD

Compute the Root Normalised Order-aware Divergence (RNOD) for ordinal quantification evaluation.

VSE

Compute the Variance-normalised Squared Error (VSE).

CvM_L1

Compute the L1 version of the Cramér–von Mises statistic (Xiao et al., 2006) between two cumulative distributions, as suggested by Bella et al. (2014).

GridSearchQ

Grid search over quantifier hyperparameters with evaluation protocols.

BaseProtocol

Abstract base class for evaluation protocols.

APP

Artificial Prevalence Protocol (APP).

NPP

Natural Prevalence Protocol (NPP).

UPP

Uniform Prevalence Protocol (UPP).

PPP

Personalized Prevalence Protocol (PPP).

apply_protocol

Evaluate a quantifier across an evaluation protocol.

binary_quantifier

Decorator to enable binary quantification extensions (One-vs-Rest or One-vs-One).

BinaryQuantifier

Meta-quantifier enabling One-vs-Rest and One-vs-One strategies.

MulticlassStrategy

Base class for multiclass decomposition strategies.

register_strategy

Register a MulticlassStrategy subclass under name.

get_strategy

Return the registered MulticlassStrategy instance for name.

available_strategies

Return the sorted names of all registered multiclass strategies.

PWK

Probabilistic Weighted k-Nearest Neighbour (PWK) quantifier.

QuaNet

QuaNet: deep neural quantification with an LSTM architecture.

BaseRepresentation

Base class for quantification representations.

HistogramRepresentation

Histogram-based representation.

KDERepresentation

Kernel density estimation representation.

DistanceRepresentation

Distance-based representation for quantification.

KernelMeanRepresentation

Kernel mean embedding representation.

PredictionRepresentation

Representation based on classifier predictions.

HardPredictionRepresentation

Hard-prediction representation convenience class.

SoftPredictionRepresentation

Soft-prediction representation convenience class.

solve_binary

Minimize a scalar objective over the binary prevalence space [0, 1].

ternary_search

Find the minimum of a unimodal function via ternary search.

solve_simplex

Minimize a function over the probability simplex using SLSQP.

minimize_prevalence

Minimize an objective function over the probability simplex.

minimize_prevalence_blocks

Minimize a loss over multiple representation blocks and aggregate results.

get_prev_from_labels

Get the real prevalence of each class in the target array.

normalize_prevalence

Normalize the prevalence of each class to sum to 1.

load_quantifier

Load a quantifier from a file.

make_prevs

Generate a list of n_dim values uniformly distributed between 0 and 1 that sum exactly to 1.

apply_cross_validation

Perform cross-validation and return predictions with true labels for each fold.

simplex_uniform_kraemer

Generates n_prev prevalence vectors of n_dim classes uniformly distributed on the simplex, with optional lower and upper bounds.

simplex_grid_sampling

Efficiently generates artificial prevalence vectors that sum to 1 and respect min_val ≤ p_i ≤ max_val for all i.

simplex_uniform_sampling

Generates uniformly distributed prevalence vectors within the simplex, constrained by min_val ≤ p_i ≤ max_val.

get_indexes_with_prevalence

Get indexes for a stratified sample based on the prevalence of each class.

DiagonalDisplay

True vs. predicted prevalence diagonal plot.

BiasDisplay

Boxplots of signed prevalence-estimation error.

ErrorByShiftDisplay

Estimation error as a function of prior-probability shift.

PrevalenceDisplay

Bar chart of a single sample’s predicted class prevalence.

ConfidenceRegionDisplay

Confidence region around a single prevalence prediction.