minimize_prevalence_blocks#

mlquantify.solvers.minimize_prevalence_blocks(objective_factory, test_representation, train_representations, block_slices, n_classes, solver='grid', aggregate='median', grid_size=101)[source]#

Minimize a loss over multiple representation blocks and aggregate results.

For each sub-vector (block) of the test and training representations, builds a block-specific objective via objective_factory and minimizes it independently. The per-block prevalence estimates are then aggregated (median or mean) into a single prevalence vector.

This is the core routine of the Histogram Distribution Matching (HDy) family of quantifiers, where each block corresponds to one histogram bin interval.

Parameters:
objective_factorycallable

Factory that receives test_block and train_block keyword arguments and returns a scalar objective function suitable for minimize_prevalence.

test_representationndarray of shape (n_components,)

Full representation vector of the test sample.

train_representationsarray-like of shape (n_classes, n_components)

Full representation vectors for each training class.

block_sliceslist of slice

Ordered list of slices that partition n_components into blocks.

n_classesint

Number of classes.

solverstr, default=’grid’

Solver passed to minimize_prevalence for each block.

aggregate{‘median’, ‘mean’}, default=’median’

How to combine the per-block prevalence estimates.

grid_sizeint, default=101

Number of grid points used by the 'grid' solver for binary problems.

Returns:
prevalencendarray of shape (n_classes,)

Aggregated prevalence vector summing to 1.

lossfloat

Aggregated objective value across blocks.

Raises:
ValueError

If aggregate is not 'median' or 'mean'.

Examples

>>> import numpy as np
>>> from mlquantify.solvers._blocks import minimize_prevalence_blocks
>>> # Toy binary example with two histogram blocks
>>> test_rep = np.array([0.4, 0.6, 0.3, 0.7])
>>> train_reps = np.array([[0.5, 0.5, 0.5, 0.5],
...                        [0.2, 0.8, 0.2, 0.8]])
>>> slices = [slice(0, 2), slice(2, 4)]
>>> factory = lambda test_block, train_block: (
...     lambda a: np.sum((test_block - ((1-a)*train_block[0] + a*train_block[1]))**2)
... )
>>> prevalence, loss = minimize_prevalence_blocks(
...     factory, test_rep, train_reps, slices, n_classes=2
... )
>>> prevalence.shape
(2,)