HistogramRepresentation#
- class mlquantify.representations.HistogramRepresentation(bins=(10,), range=(0.0, 1.0), mode='histogram', features=None, partition_blocks=False, bin_edges='fixed', laplace_smoothing=False)[source]#
Histogram-based representation.
This representation computes histograms for each feature (or selected subset) independently, and concatenates the bin frequencies into a single vector. When
partition_blocks=True, the attributeblock_slices_is populated with the slice objects that identify each bin group in the concatenated output.- Parameters:
- binsint or array-like of shape (n_features,), default=(10,)
The number of bins for each feature. If an integer, the same number of bins is used for all features.
- rangetuple of shape (2,), default=(0.0, 1.0)
The lower and upper bounds for the histogram range.
- modestr, default=”histogram”
The mode of the histogram. Options are: - “histogram”: Compute the histogram counts. - “onehot”: Compute a one-hot encoding of the histogram. In this mode, the output is a binary vector indicating which bin each value falls into, averaged over all samples.
- featuresarray-like of shape (n_features,), default=None
The indices of the features to use. If None, all features are used to compute the histogram representation.
- partition_blocksbool, default=False
Whether to partition the output into contiguous blocks corresponding to each feature’s bins. When
partition_blocks=True, the returned representation is still a single 1-D concatenated vector, but it is organized in feature-wise blocks: all bins for the first selected feature appear first, then all bins for the second selected feature, etc. In this case the attributeblock_slices_is populated with a tuple of slice objects that identify the start/stop indices for each block, allowing easy extraction of per-feature bin groups from the concatenated vector. Whenpartition_blocks=False, the same concatenation is returned but noblock_slices_metadata is stored.
Examples
>>> from mlquantify.representations._histogram import HistogramRepresentation >>> import numpy as np >>> rng = np.random.default_rng(0) >>> scores = rng.uniform(0, 1, (200, 1)) >>> y = (scores[:, 0] > 0.5).astype(int) >>> rep = HistogramRepresentation(bins=(8,)).fit(scores, y) >>> rep.transform(scores[:10]).shape (8,)
- fit(X, y, classes=None, sample_weight=None)[source]#
Fit the representation to labelled training data.
Validates shapes, stores the class labels, delegates internal fitting to
_fit, and verifies that the subclass setclass_representations_during that call.- Parameters:
- Xarray-like of shape (n_samples, n_features) or (n_samples,)
Feature matrix or pre-computed score array for the training instances.
- yarray-like of shape (n_samples,)
Class labels for each training instance.
- classesarray-like of shape (n_classes,) or None, default=None
Explicit list of class labels. If
None, the unique values inyare used.- sample_weightarray-like of shape (n_samples,) or None, default=None
Per-sample weights forwarded to
_fit.
- Returns:
- selfBaseRepresentation
The fitted representation object.
- Raises:
- ValueError
If
Xandyhave inconsistent lengths orXis zero-dimensional.- AttributeError
If the subclass did not define
class_representations_inside_fit.
Examples
>>> from mlquantify.representations import HistogramRepresentation >>> import numpy as np >>> X = np.random.default_rng(0).uniform(0, 1, (100, 1)) >>> y = (X[:, 0] > 0.5).astype(int) >>> rep = HistogramRepresentation(bins=(5,)).fit(X, y) >>> rep.class_representations_.shape (2, 5)
- transform(X)[source]#
Compute the histogram representation for a set of instances.
Each feature (or selected subset) is binned independently, and the bin frequencies are concatenated into a single vector. When
partition_blocks=True, the attributeblock_slices_is populated with the slice objects that identify each bin group in the concatenated output.- Parameters:
- Xarray-like of shape (n_samples, n_features) or (n_samples,)
Feature matrix or 1-D score array.
- Returns:
- representationndarray of shape (n_bins_total,)
Normalized histogram vector (sums to 1 per feature-bin group).
Examples
>>> from mlquantify.representations._histogram import HistogramRepresentation >>> import numpy as np >>> rng = np.random.default_rng(0) >>> scores = rng.uniform(0, 1, (200, 1)) >>> y = (scores[:, 0] > 0.5).astype(int) >>> rep = HistogramRepresentation(bins=(8,)).fit(scores, y) >>> rep.transform(scores[:10]).shape (8,)