PWK#
- class mlquantify.neighbors.PWK(alpha=1, n_neighbors=10, algorithm='auto', metric='euclidean', leaf_size=30, p=2, metric_params=None, n_jobs=None)[source]#
Probabilistic Weighted k-Nearest Neighbour (PWK) quantifier.
Targets prior probability shift. PWK is an aggregative Classify-and-Count quantifier — it shares the standard
fit/predict/aggregateinterface ofCC— but its classifier is a k-nearest-neighbour rule modified for quantification (PWKCLF): each neighbour’s vote is re-weighted by a class-specific factor (controlled byalpha) so the count is not dominated by the majority class. Unlike the other aggregative quantifiers, PWK therefore takes no external estimator parameter: the modified k-NN is intrinsic to the method.- Parameters:
- alphafloat, default=1
Imbalance-correction exponent.
1applies the standard inverse-size weighting; higher values further amplify minority-class neighbours.- n_neighborsint, default=10
Number of nearest neighbours considered for each test instance.
- algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’
Neighbour-search algorithm.
'auto': pick the best of the below from the fitted data (default).'ball_tree': ball-tree index; good in higher dimensions.'kd_tree': k-d tree index; fast in low dimensions.'brute': exhaustive search; exact, best for small data.
- metricstr, default=’euclidean’
Distance metric for the neighbour search.
- leaf_sizeint, default=30
Leaf size for the tree-based algorithms (speed/memory trade-off).
- pint, default=2
Power parameter for the Minkowski metric (
1= Manhattan,2= Euclidean).- metric_paramsdict or None, default=None
Additional keyword arguments for the metric function.
- n_jobsint or None, default=None
Number of parallel jobs for the neighbour search.
- Attributes:
- estimatorPWKCLF
The underlying weighted k-NN classifier (built from the parameters above; not an argument).
- estimator_PWKCLF
The fitted classifier.
- classes_ndarray of shape (n_classes,)
Class labels seen during
fit.
See also
CCPlain classify-and-count baseline.
ACCAdjusted count for binary prior shift.
Notes
PWK is a classify-and-count method whose only quantification-specific ingredient is the imbalance re-weighting; it needs no separate scorer and handles multiclass directly, but inherits k-NN’s sensitivity to feature scaling and dimensionality. Because it subclasses
CC,aggregate(with its optionalclassesargument) is available too.References
References
[1]Barranquero, J., Díez, J., & del Coz, J. J. (2013). Quantification-Oriented Learning Based on Reliable Classifiers. Pattern Recognition, 48(2), 591–604.
Examples
>>> from mlquantify.neighbors import PWK >>> from sklearn.datasets import make_classification >>> X, y = make_classification(n_samples=200, random_state=42) >>> q = PWK(alpha=1.5, n_neighbors=5).fit(X, y) >>> q.predict(X) {0: ..., 1: ...}
- aggregate(predictions, classes=None)[source]#
Aggregate predictions into class prevalence estimates.
- Parameters:
- predictionsndarray of shape (n_samples,) or (n_samples, n_classes)
Estimator predictions on test data. Can be probabilities (n_samples, n_classes) or class labels (n_samples,).
- classesarray-like of shape (n_classes,) or None, default=None
Class labels the output must report, in order. When given, every class appears in the result even if absent from
predictions(with prevalence 0). WhenNone, the classes seen duringfitare used; if the quantifier is unfitted, they are inferred from the predictions.
- Returns:
- ndarray of shape (n_classes,)
Class prevalence estimates.
Examples
>>> from mlquantify.counting import CC >>> import numpy as np >>> q = CC() >>> predictions = np.random.rand(200) >>> q.aggregate(predictions) {0: ..., 1: ...}
- classify(X)[source]#
Classify test instances using the underlying weighted k-NN estimator.
Returns hard class labels produced by
PWKCLFwithout any prevalence-level aggregation.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test feature matrix.
- Returns:
- labelsndarray of shape (n_samples,)
Predicted class label for each test instance.
Examples
>>> from mlquantify.neighbors import PWK >>> from sklearn.datasets import make_classification >>> X, y = make_classification(n_samples=200, random_state=42) >>> q = PWK(alpha=1.5, n_neighbors=5).fit(X, y) >>> labels = q.classify(X[:5])
- fit(X, y, estimator_fitted=False, *args, **kwargs)[source]#
Fit the quantifier using the provided data and estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_fit_request(*, estimator_fitted: bool | None | str = '$UNCHANGED$') PWK[source]#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- estimator_fittedstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
estimator_fittedparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.