minimize_prevalence_blocks#

mlquantify.solvers.minimize_prevalence_blocks(objective_factory, test_representation, train_representations, block_slices, n_classes, solver='grid', aggregate='median', grid_size=101)[source]#

Minimize a loss over multiple representation blocks and aggregate results.

For each sub-vector (block) of the test and training representations, builds a block-specific objective via objective_factory and minimizes it independently. The per-block prevalence estimates are then aggregated (median or mean) into a single prevalence vector.

This is the core routine of the Histogram Distribution Matching (HDy) family of quantifiers, where each block corresponds to one histogram bin interval.

Parameters:

objective_factorycallable: Factory that receives test_block and train_block keyword arguments and returns a scalar objective function suitable for minimize_prevalence.
test_representationndarray of shape (n_components,): Full representation vector of the test sample.
train_representationsarray-like of shape (n_classes, n_components): Full representation vectors for each training class.
block_sliceslist of slice: Ordered list of slices that partition n_components into blocks.
n_classesint: Number of classes.
solverstr, default=’grid’: Solver passed to minimize_prevalence for each block.
aggregate{‘median’, ‘mean’}, default=’median’: How to combine the per-block prevalence estimates.
grid_sizeint, default=101: Number of grid points used by the 'grid' solver for binary problems.

Returns:

prevalencendarray of shape (n_classes,): Aggregated prevalence vector summing to 1.
lossfloat: Aggregated objective value across blocks.

Raises:

ValueError: If aggregate is not 'median' or 'mean'.

Examples

>>> import numpy as np
>>> from mlquantify.solvers._blocks import minimize_prevalence_blocks
>>> # Toy binary example with two histogram blocks
>>> test_rep = np.array([0.4, 0.6, 0.3, 0.7])
>>> train_reps = np.array([[0.5, 0.5, 0.5, 0.5],
...                        [0.2, 0.8, 0.2, 0.8]])
>>> slices = [slice(0, 2), slice(2, 4)]
>>> factory = lambda test_block, train_block: (
...     lambda a: np.sum((test_block - ((1-a)*train_block[0] + a*train_block[1]))**2)
... )
>>> prevalence, loss = minimize_prevalence_blocks(
...     factory, test_rep, train_reps, slices, n_classes=2
... )
>>> prevalence.shape
(2,)