DistanceRepresentation#
- class mlquantify.representations.DistanceRepresentation(metric='euclidean')[source]#
Distance-based representation for quantification.
Summarises a collection of instances as the vector of mean pairwise distances to each training class. The representation of a set of test instances is the column-wise mean of the per-instance distance vectors, yielding a single
(n_classes,)descriptor.This is used by the Energy Distance Quantifier (EDy), where the distance between the test and each class centroid forms the basis of the prevalence estimation objective.
- Parameters:
- metricstr, default=’euclidean’
Distance metric passed to
scipy.spatial.distance.cdist.
- Attributes:
- X_train_ndarray of shape (n_samples, n_features)
Training feature matrix stored at fit time.
- y_train_ndarray of shape (n_samples,)
Training labels stored at fit time.
- class_representations_ndarray of shape (n_classes,)
Mean pairwise distance from each training class to the full training set.
- classes_ndarray of shape (n_classes,)
Unique class labels seen during fit.
Examples
>>> from mlquantify.representations._distance import DistanceRepresentation >>> import numpy as np >>> rng = np.random.default_rng(0) >>> X = rng.standard_normal((100, 4)) >>> y = (X[:, 0] > 0).astype(int) >>> rep = DistanceRepresentation().fit(X, y) >>> rep.transform(X[:10]).shape (2,)
- fit(X, y, classes=None, sample_weight=None)[source]#
Fit the representation to labelled training data.
Validates shapes, stores the class labels, delegates internal fitting to
_fit, and verifies that the subclass setclass_representations_during that call.- Parameters:
- Xarray-like of shape (n_samples, n_features) or (n_samples,)
Feature matrix or pre-computed score array for the training instances.
- yarray-like of shape (n_samples,)
Class labels for each training instance.
- classesarray-like of shape (n_classes,) or None, default=None
Explicit list of class labels. If
None, the unique values inyare used.- sample_weightarray-like of shape (n_samples,) or None, default=None
Per-sample weights forwarded to
_fit.
- Returns:
- selfBaseRepresentation
The fitted representation object.
- Raises:
- ValueError
If
Xandyhave inconsistent lengths orXis zero-dimensional.- AttributeError
If the subclass did not define
class_representations_inside_fit.
Examples
>>> from mlquantify.representations import HistogramRepresentation >>> import numpy as np >>> X = np.random.default_rng(0).uniform(0, 1, (100, 1)) >>> y = (X[:, 0] > 0.5).astype(int) >>> rep = HistogramRepresentation(bins=(5,)).fit(X, y) >>> rep.class_representations_.shape (2, 5)
- transform(X)[source]#
Compute mean pairwise distances to each training class.
For every test instance the mean distance to all training samples of each class is computed. The returned vector is the column-wise mean across all test instances.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test feature matrix.
- Returns:
- representationndarray of shape (n_classes,)
Mean distance from the test set to each training class.
Examples
>>> from mlquantify.representations._distance import DistanceRepresentation >>> import numpy as np >>> rng = np.random.default_rng(1) >>> X = rng.standard_normal((80, 2)) >>> y = (X[:, 0] > 0).astype(int) >>> rep = DistanceRepresentation().fit(X, y) >>> dist = rep.transform(X[:5]) >>> dist.shape (2,)