.. _nearest_neighbors:

.. currentmodule:: mlquantify.neighbors

=================
Nearest Neighbors
=================

PWK: Pair-wise Weighted K-Nearest Neighbors
===========================================

:class:`PWK` and **PWK**\ :math:`\alpha` are neighbor-based algorithms (k-Nearest Neighbors - NN) that apply class weighting techniques to estimate prevalence in binary problems [2]_. They can be seen as aggregative quantification methods that adjust a k-NN classifier to account for class imbalance.

PWK methods focus on simplicity and stability, offering competitive quantification that balances effectiveness with interpretability. By exploring the topology of the data through proximity-based analysis, these algorithms are robust to distribution changes and retain local structure better than global classifiers [2]_. The methods complement k-NN with class weighting policies designed to neutralize the bias toward majority classes inherent in imbalanced datasets [2]_.

PWK (Proportion-Weighted K-Nearest Neighbor)
--------------------------------------------

This is the simplest version of the algorithm.

* **Mechanism:** The weight :math:`w_c` is defined to be **inversely proportional** to the size of class :math:`c` relative to the total training set size [2]_.
* **Simplicity:** The weights are easily interpretable, and the method only requires the calibration of the number of neighbors :math:`k`.

.. dropdown:: Mathematical details - PWK Weights

   The weight for a class :math:`\gamma_j` is calculated as:

   .. math::

       w_{\gamma_j} = \frac{1 - \frac{m_{\gamma_j}}{m}}{m_{\gamma_j}}

   Where :math:`m` is the total training size and :math:`m_{\gamma_j}` is the count of samples in class :math:`\gamma_j`.

PWK\ :math:`\alpha` (Alpha-Adjusted PWK)
----------------------------------------

**PWK**\ :math:`\alpha` is a general structure that encompasses both standard KNN and PWK as special cases [2]_.

* **Mechanism:** It introduces a tunable parameter :math:`\alpha` into the weighting formula.
* **Flexibility:** :math:`\alpha` is a free parameter that allows the model to adapt to specific datasets. As :math:`\alpha` increases, the penalty for larger classes is progressively smoothed [2]_.
* **Relationship with KNN and PWK:**
    * When :math:`\alpha \to \infty`, the class weights converge to 1, and the algorithm behaves like a traditional **KNN**.
    * When :math:`\alpha = 1`, the algorithm is equivalent to standard **PWK**.
* **Performance:** Empirically, there is often no significant statistical difference in Absolute Error between PWK and PWK\ :math:`\alpha`. However, PWK\ :math:`\alpha` tends to be more **conservative and robust** at lower prevalences, while PWK may be more competitive at higher prevalences [2]_.

.. dropdown:: Mathematical details - PWK-Alpha Weights

   The weight for class :math:`\gamma_j` is a ratio between the class cardinality and the minority class cardinality (:math:`M`), raised to the power of :math:`1/\alpha`:

   .. math::

      w_{\gamma_j}(\alpha) = \left( \frac{M}{m_{\gamma_j}} \right)^{-1/\alpha}, \quad \text{with } \alpha \ge 1

**Example**

.. code-block:: python

   from mlquantify.neighbors import PWK
   
   # PWK operates directly on the feature space using k-NN
   q = PWK(n_neighbors=10)
   q.fit(X_train, y_train)
   q.predict(X_test)

.. dropdown:: References

    .. [2] Barranquero, J., González, P., Díez, J., & del Coz, J. J. (2013). On the study of nearest neighbor algorithms for prevalence estimation in binary problems. Pattern Recognition, 46(2), 472-482. https://doi.org/10.1016/j.patcog.2012.07.022