minimize_prevalence_blocks#
- mlquantify.solvers.minimize_prevalence_blocks(objective_factory, test_representation, train_representations, block_slices, n_classes, solver='grid', aggregate='median', grid_size=101)[source]#
Minimize a loss over multiple representation blocks and aggregate results.
For each sub-vector (block) of the test and training representations, builds a block-specific objective via
objective_factoryand minimizes it independently. The per-block prevalence estimates are then aggregated (median or mean) into a single prevalence vector.This is the core routine of the Histogram Distribution Matching (HDy) family of quantifiers, where each block corresponds to one histogram bin interval.
- Parameters:
- objective_factorycallable
Factory that receives
test_blockandtrain_blockkeyword arguments and returns a scalar objective function suitable forminimize_prevalence.- test_representationndarray of shape (n_components,)
Full representation vector of the test sample.
- train_representationsarray-like of shape (n_classes, n_components)
Full representation vectors for each training class.
- block_sliceslist of slice
Ordered list of slices that partition
n_componentsinto blocks.- n_classesint
Number of classes.
- solverstr, default=’grid’
Solver passed to
minimize_prevalencefor each block.- aggregate{‘median’, ‘mean’}, default=’median’
How to combine the per-block prevalence estimates.
- grid_sizeint, default=101
Number of grid points used by the
'grid'solver for binary problems.
- Returns:
- prevalencendarray of shape (n_classes,)
Aggregated prevalence vector summing to 1.
- lossfloat
Aggregated objective value across blocks.
- Raises:
- ValueError
If
aggregateis not'median'or'mean'.
Examples
>>> import numpy as np >>> from mlquantify.solvers._blocks import minimize_prevalence_blocks >>> # Toy binary example with two histogram blocks >>> test_rep = np.array([0.4, 0.6, 0.3, 0.7]) >>> train_reps = np.array([[0.5, 0.5, 0.5, 0.5], ... [0.2, 0.8, 0.2, 0.8]]) >>> slices = [slice(0, 2), slice(2, 4)] >>> factory = lambda test_block, train_block: ( ... lambda a: np.sum((test_block - ((1-a)*train_block[0] + a*train_block[1]))**2) ... ) >>> prevalence, loss = minimize_prevalence_blocks( ... factory, test_rep, train_reps, slices, n_classes=2 ... ) >>> prevalence.shape (2,)