2.8. Nearest Neighbours#

Nearest-neighbour quantifiers estimate class prevalences by leveraging the local structure of the feature space. They are non-parametric, require no distributional assumptions, and are naturally robust to non-linear decision boundaries.

2.8.1. PWK — Probabilistic Weighted k-Nearest Neighbours #

PWK (Barranquero et al., 2013) wraps a k-NN classifier with a class-imbalance-aware weighting scheme and then applies Classify and Count (CC) on its predictions. Each neighbour’s vote is multiplied by a class-specific weight that corrects for the size difference between classes, so that the minority class is not drowned out by the majority.

The weight for class \(c\) is:

\[w_c(\alpha) = \left(\frac{M}{m_c}\right)^{1/\alpha}\]

where \(M = \min_c m_c\) is the smallest class size and \(\alpha\) is the imbalance-correction exponent.

Special cases:

\(\alpha = 1\) → standard PWK (weights proportional to inverse class size).
\(\alpha \to \infty\) → all weights equal to 1 (standard k-NN).

Why it exists: A standard k-NN classifier biases predictions towards the majority class. PWK’s weighting neutralises this bias, producing more accurate CC estimates under class imbalance. Barranquero et al. (2013) showed PWK outperforms standard CC + k-NN by a wide margin on imbalanced datasets.

2.8.1.1. Parameters #

Parameter	Default	Explanation
`alpha`	`1`	Imbalance-correction exponent. Higher values reduce the penalty on larger classes: `alpha=1` — standard PWK weighting (most aggressive correction). `alpha=2` — gentler correction. Very large `alpha` (e.g. 100) approaches standard k-NN. Tune by cross-validation if you are unsure; `alpha=1` is a good starting point.
`n_neighbors`	`10`	Number of neighbours \(k\). Larger \(k\) gives smoother (lower variance) prevalence estimates. Use larger values on bigger datasets. Common choices: 5, 10, 20.
`algorithm`	`'auto'`	k-NN search algorithm. `'auto'` selects the fastest available (ball_tree/kd_tree for low-dim data, brute-force for high-dim).
`metric`	`'euclidean'`	Distance metric for neighbour search. Use `'euclidean'` for continuous normalised features; `'cosine'` for sparse text/embedding vectors; `'manhattan'` for count data.
`leaf_size`	`30`	Leaf size for tree-based algorithms. Affects speed, not accuracy.
`p`	`2`	Minkowski distance parameter. `p=2` → Euclidean, `p=1` → Manhattan.
`metric_params`	`None`	Extra keyword arguments for custom metric functions.
`n_jobs`	`None`	Parallel jobs for neighbour search. `-1` uses all cores.

2.8.1.2. Examples #

Basic usage:

from mlquantify.neighbors import PWK
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, weights=[0.8, 0.2],
                           random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

q = PWK(alpha=1, n_neighbors=10)
q.fit(X_train, y_train)
print(q.predict(X_test))
# {0: 0.80, 1: 0.20}

Tuning alpha and k:

from mlquantify.model_selection import GridSearchQ, APP
from mlquantify.metrics import MAE
from mlquantify.neighbors import PWK

protocol = APP(batch_size=100, n_prevalences=21, repeats=5)
gs = GridSearchQ(
    quantifier=PWK(),
    param_grid={
        'alpha': [1, 2, 5],
        'n_neighbors': [5, 10, 20],
    },
    protocol=protocol,
    error=MAE,
)
gs.fit(X_train, y_train)
print(gs.best_params_)

Getting per-instance classifications:

q = PWK(alpha=1, n_neighbors=10)
q.fit(X_train, y_train)
labels = q.classify(X_test)   # hard labels from the weighted k-NN
print(labels[:10])

2.8.1.3. When to Use PWK #

When you want a simple, non-parametric quantifier without tuning a full probabilistic classifier.
When feature space distances are meaningful (normalised continuous features).
For imbalanced binary problems where a standard classifier would be biased.

Limitation: PWK applies CC on top of the k-NN classifier; it inherits CC’s bias under distributional shift. It does not apply any adjustment like ACC or EMQ. For strong shift correction, use EMQ or DyS instead.

2.8.2. Nearest Neighbors #

2.8.3. PWK: Pair-wise Weighted K-Nearest Neighbors #

PWK and PWK\(\alpha\) are neighbor-based algorithms (k-Nearest Neighbors - NN) that apply class weighting techniques to estimate prevalence in binary problems [2]. They can be seen as aggregative quantification methods that adjust a k-NN classifier to account for class imbalance.

PWK methods focus on simplicity and stability, offering competitive quantification that balances effectiveness with interpretability. By exploring the topology of the data through proximity-based analysis, these algorithms are robust to distribution changes and retain local structure better than global classifiers [2]. The methods complement k-NN with class weighting policies designed to neutralize the bias toward majority classes inherent in imbalanced datasets [2].

2.8.3.1. PWK (Proportion-Weighted K-Nearest Neighbor)#

This is the simplest version of the algorithm.

Mechanism: The weight \(w_c\) is defined to be inversely proportional to the size of class \(c\) relative to the total training set size [2].
Simplicity: The weights are easily interpretable, and the method only requires the calibration of the number of neighbors \(k\).

2.8.3.2. PWK\(\alpha\) (Alpha-Adjusted PWK)#

PWK\(\alpha\) is a general structure that encompasses both standard KNN and PWK as special cases [2].

Mechanism: It introduces a tunable parameter \(\alpha\) into the weighting formula.
Flexibility: \(\alpha\) is a free parameter that allows the model to adapt to specific datasets. As \(\alpha\) increases, the penalty for larger classes is progressively smoothed [2].
Relationship with KNN and PWK:
- When \(\alpha \to \infty\), the class weights converge to 1, and the algorithm behaves like a traditional KNN.
- When \(\alpha = 1\), the algorithm is equivalent to standard PWK.
Performance: Empirically, there is often no significant statistical difference in Absolute Error between PWK and PWK\(\alpha\). However, PWK\(\alpha\) tends to be more conservative and robust at lower prevalences, while PWK may be more competitive at higher prevalences [2].

Example

from mlquantify.neighbors import PWK

# PWK operates directly on the feature space using k-NN
q = PWK(n_neighbors=10)
q.fit(X_train, y_train)
q.predict(X_test)