KDERepresentation#
- class mlquantify.representations.KDERepresentation(bandwidth=0.1, kernel='gaussian')[source]#
Kernel density estimation representation.
This representation fits a kernel density estimator (KDE) to the training data for each class, using the specified bandwidth and kernel. The test-time representation is simply the raw feature vector, and per-class likelihoods can be obtained by evaluating the fitted KDEs on the test samples. This allows for a non-parametric density-based representation of the data, which can capture complex class distributions without assuming a specific parametric form.
- Parameters:
- bandwidthfloat, default=0.1
The bandwidth parameter for the KDE, controlling the smoothness of the density estimate. Smaller values lead to a more flexible fit that can capture finer details of the data distribution, while larger values produce a smoother estimate that may overlook small-scale structure.
- kernelstr, default=”gaussian”
The kernel to use for the KDE. Options are: - “gaussian”: Gaussian kernel (the default), which produces a smooth density estimate. - “tophat”: Uniform kernel, which gives equal weight to all points within the bandwidth radius and zero weight to points outside. - “epanechnikov”: Epanechnikov kernel, which is a parabolic kernel that gives more weight to points closer to the center of the bandwidth and less weight to points farther away, with zero weight beyond the bandwidth radius. - “exponential”: Exponential kernel, which gives more weight to points closer to the center of the bandwidth and decays exponentially with distance, without a hard cutoff. - “linear”: Linear kernel, which gives weight that decreases linearly with distance from the center of the bandwidth, with zero weight beyond the bandwidth radius. - “cosine”: Cosine kernel, which gives weight that follows a cosine function of the distance from the center of the bandwidth, with zero weight beyond the bandwidth radius
Examples
>>> from mlquantify.representations._density import KDERepresentation >>> import numpy as np >>> rng = np.random.default_rng(0) >>> X = rng.standard_normal((100, 2)) >>> y = (X[:, 0] > 0).astype(int) >>> rep = KDERepresentation(bandwidth=0.2, kernel="gaussian").fit(X, y) >>> rep.class_representations_[0] KernelDensity(bandwidth=0.2, kernel='gaussian')
- class_likelihoods(X)[source]#
Evaluate per-class kernel density likelihoods for test instances.
For each class KDE fitted during
fit, scores every test sample and exponentiates the log-density to obtain raw likelihood values.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test feature matrix.
- Returns:
- likelihoodsndarray of shape (n_classes, n_samples)
Per-class likelihood for each test instance, where
likelihoods[c, i]is the density of classcat samplei.
Examples
>>> from mlquantify.representations._density import KDERepresentation >>> import numpy as np >>> rng = np.random.default_rng(0) >>> X = rng.standard_normal((100, 2)) >>> y = (X[:, 0] > 0).astype(int) >>> rep = KDERepresentation().fit(X, y) >>> lkl = rep.class_likelihoods(X[:5]) >>> lkl.shape (2, 5)
- fit(X, y, classes=None, sample_weight=None)[source]#
Fit the representation to labelled training data.
Validates shapes, stores the class labels, delegates internal fitting to
_fit, and verifies that the subclass setclass_representations_during that call.- Parameters:
- Xarray-like of shape (n_samples, n_features) or (n_samples,)
Feature matrix or pre-computed score array for the training instances.
- yarray-like of shape (n_samples,)
Class labels for each training instance.
- classesarray-like of shape (n_classes,) or None, default=None
Explicit list of class labels. If
None, the unique values inyare used.- sample_weightarray-like of shape (n_samples,) or None, default=None
Per-sample weights forwarded to
_fit.
- Returns:
- selfBaseRepresentation
The fitted representation object.
- Raises:
- ValueError
If
Xandyhave inconsistent lengths orXis zero-dimensional.- AttributeError
If the subclass did not define
class_representations_inside_fit.
Examples
>>> from mlquantify.representations import HistogramRepresentation >>> import numpy as np >>> X = np.random.default_rng(0).uniform(0, 1, (100, 1)) >>> y = (X[:, 0] > 0.5).astype(int) >>> rep = HistogramRepresentation(bins=(5,)).fit(X, y) >>> rep.class_representations_.shape (2, 5)
- transform(X)[source]#
Return the input as a float array (identity transform).
The KDE representation uses raw feature vectors as the test-time representation; density evaluation happens in
class_likelihoods.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test feature matrix.
- Returns:
- X_transformedndarray of shape (n_samples, n_features)
Input cast to float64.
Examples
>>> from mlquantify.representations._density import KDERepresentation >>> import numpy as np >>> rep = KDERepresentation() >>> X = np.array([[0.1, 0.2], [0.3, 0.4]]) >>> rep.transform(X).shape (2, 2)