DistanceRepresentation#

class mlquantify.representations.DistanceRepresentation(metric='euclidean')[source]#

Distance-based representation for quantification.

Summarises a collection of instances as the vector of mean pairwise distances to each training class. The representation of a set of test instances is the column-wise mean of the per-instance distance vectors, yielding a single (n_classes,) descriptor.

This is used by the Energy Distance Quantifier (EDy), where the distance between the test and each class centroid forms the basis of the prevalence estimation objective.

Parameters:
metricstr, default=’euclidean’

Distance metric passed to scipy.spatial.distance.cdist.

Attributes:
X_train_ndarray of shape (n_samples, n_features)

Training feature matrix stored at fit time.

y_train_ndarray of shape (n_samples,)

Training labels stored at fit time.

class_representations_ndarray of shape (n_classes,)

Mean pairwise distance from each training class to the full training set.

classes_ndarray of shape (n_classes,)

Unique class labels seen during fit.

Examples

>>> from mlquantify.representations._distance import DistanceRepresentation
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((100, 4))
>>> y = (X[:, 0] > 0).astype(int)
>>> rep = DistanceRepresentation().fit(X, y)
>>> rep.transform(X[:10]).shape
(2,)
fit(X, y, classes=None, sample_weight=None)[source]#

Fit the representation to labelled training data.

Validates shapes, stores the class labels, delegates internal fitting to _fit, and verifies that the subclass set class_representations_ during that call.

Parameters:
Xarray-like of shape (n_samples, n_features) or (n_samples,)

Feature matrix or pre-computed score array for the training instances.

yarray-like of shape (n_samples,)

Class labels for each training instance.

classesarray-like of shape (n_classes,) or None, default=None

Explicit list of class labels. If None, the unique values in y are used.

sample_weightarray-like of shape (n_samples,) or None, default=None

Per-sample weights forwarded to _fit.

Returns:
selfBaseRepresentation

The fitted representation object.

Raises:
ValueError

If X and y have inconsistent lengths or X is zero-dimensional.

AttributeError

If the subclass did not define class_representations_ inside _fit.

Examples

>>> from mlquantify.representations import HistogramRepresentation
>>> import numpy as np
>>> X = np.random.default_rng(0).uniform(0, 1, (100, 1))
>>> y = (X[:, 0] > 0.5).astype(int)
>>> rep = HistogramRepresentation(bins=(5,)).fit(X, y)
>>> rep.class_representations_.shape
(2, 5)
transform(X)[source]#

Compute mean pairwise distances to each training class.

For every test instance the mean distance to all training samples of each class is computed. The returned vector is the column-wise mean across all test instances.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test feature matrix.

Returns:
representationndarray of shape (n_classes,)

Mean distance from the test set to each training class.

Examples

>>> from mlquantify.representations._distance import DistanceRepresentation
>>> import numpy as np
>>> rng = np.random.default_rng(1)
>>> X = rng.standard_normal((80, 2))
>>> y = (X[:, 0] > 0).astype(int)
>>> rep = DistanceRepresentation().fit(X, y)
>>> dist = rep.transform(X[:5])
>>> dist.shape
(2,)