1.2. Adjust Counting#

Adjusted Counting methods improve upon simple “counting” quantifiers by correcting bias using what is known about the classifier’s errors on the training set. They aim to produce better estimates of class prevalence (how frequent each class is in a dataset) even when training and test distributions differ.

see Counters For Quantification for an overview of the base counters for quantification.

Currently, there are two types of adjustment methods implemented:

  1. Threshold Adjustment Methods: These methods adjust the decision threshold of the classifier to optimize prevalence estimation. Examples include Adjusted Classify and Count (ACC) and its probabilistic counterpart PACC.

  2. Matrix Adjustment Methods: These methods use a confusion matrix derived from the classifier’s performance on a validation set to adjust the estimated prevalences. Examples include the EM-based methods and other matrix inversion techniques.

1.2.1. Threshold Adjustment#

Threshold-based adjustment methods correct the bias of CC by using the classifier’s True Positive Rate (TPR) and False Positive Rate (FPR). They are mainly used for binary quantification tasks.

Adjusted Classify and Count (ACC) Equation

\[\hat{p}^U_{ACC}(⊕) = \frac{\hat{p}^U_{CC}(⊕) - FPR_L}{TPR_L - FPR_L}\]
caption:

Corrected prevalence estimate using classifier error rates

The main idea is that by adjusting the observed rate of positive predictions, we can better approximate the real class distribution.

Threshold selection policies comparison

Comparison of different threshold selection policies showing FPR and 1-TPR curves with optimal thresholds for each method [Adapted from Forman (2008)]#

Different threshold methods vary in how they choose the classifier cutoff \(\tau\) for scores \(s(x)\) .

Method

Threshold Choice

Goal

ACC

Fixed threshold \(\tau = 0.5\)

Simple baseline adjustment

X_method

Threshold where \(\text{FPR} = 1 - \text{TPR}\)

Avoids unstable prediction tails

MAX

Threshold maximizing \(\text{TPR} - \text{FPR}\)

Improves numerical stability

T50

Threshold where \(\text{TPR} = 0.5\)

Uses central part of ROC curve

MS (Median Sweep)

Median of all thresholds’ ACC results

Reduces effect of threshold outliers

MS2

Median Sweep variant with constraint \(\|\text{TPR} - \text{FPR}\| > 0.25\)

Reduces effect of threshold outliers

All these methods have their fit, predict and aggregate functions, similar to other aggregative quantifiers. However, they also include a specialized method: get_best_thresholds, which identifies the optimal threshold, given y and predicted probabilities. Here is an example of how to use the T50 method:

from mlquantify.adjust_counting import T50, evaluate_thresholds
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()

thresholds, tprs, fprs = evaluate_thresholds(
   y=y_test,
   probabilities=clf.predict_proba(X_test)[:, 1]) # binary proba

q = T50()
best_thr, best_tpr, best_fpr = q.get_best_thresholds(X_val, y_val)
print(f"Best threshold: {best_thr}, TPR: {best_tpr}, FPR: {best_fpr}")

Note

Threshold adjustment methods like ACC are primarily designed for binary classification tasks, For multi-class problems, matrix adjustment methods are generally preferred.

1.2.2. Matrix Adjustment#

Matrix-based adjustment methods use a confusion matrix or generalized rate matrix to adjust predictions for multi-class quantification. They treat quantification as solving a small linear system.

Matrix Equation

\[\mathbf{y = X \hat{\pi}_F + \epsilon}, \quad \text{subject to } \hat{\pi}_F \ge 0,\ \sum \hat{\pi}_F = 1\]
caption:

General linear system linking observed and true prevalences

Here:

  • \(\mathbf{y}\): average observed predictions in \(U\)

  • \(\mathbf{X}\): classifier behavior from training (mean conditional rates)

  • \(\hat{\pi}_F\): corrected class prevalences in \(U\)

[Plot Idea: Matrix illustration showing how confusion corrections map to estimated prevalences]

1.2.2.1. Generalized Adjusted Classify and Count (GAC) and Generalized Probabilistic Adjusted Classify and Count (GPAC)#

from mlquantify.adjust_counting import GAC, GPAC
from sklearn.linear_model import LogisticRegression
q = GAC(learner=LogisticRegression())
q.fit(X_train, y_train)
q.predict(X_test)
# -> {0: 0.48, 1: 0.52}

Both GAC and GPAC are solved using this linear system:

  • GAC uses hard classifier decisions (confusion matrix).

  • GPAC uses soft probabilities \(P(y=l|x)\) .

1.2.2.2. Friedman’s Method (FM)#

The FM constructs its adjustment matrix \(\mathbf{X}\) based on a specialized feature transformation function \(f_l(x)\) that indicates whether the predicted class probability for an item exceeds that class’s proportion in the training data \((\pi_l^T)\) , a technique chosen because it theoretically minimizes the variance of the resulting prevalence estimates.

Mathematical details - Friedman’s Method

To improve stability, Friedman’s Method (FM) generates the adjustment matrix \(\mathbf{X}\) using a special transformation function applied to each class \(l\) and training sample \(x\) :

\[f_l(x) = I \left[ \hat{P}_T(y = l \mid x) > \pi_l^T \right]\]

where:

  • \(I[\cdot]\) is the indicator function, equal to 1 if the condition inside is true, 0 otherwise.

  • \(\hat{P}_T(y = l \mid x)\) is the classifier’s estimated posterior probability for class \(l\) on training sample \(x\).

  • \(\pi_l^T\) is the prevalence of class \(l\) in the training set.

The entry \(X_{i,l}\) of the matrix \(\mathbf{X}\) is computed as the average of \(f_l(x)\) over all \(x\) in class \(i\) of the training data:

\[X_{i,l} = \frac{1}{|L_i|} \sum_{x \in L_i} f_l(x)\]

where:

  • \(L_i\) is the subset of training samples with true class \(i\).

  • \(|L_i|\) is the number of these samples.

This matrix is then used in the constrained least squares optimization:

\[\min_{\hat{\pi}_F} \frac{1}{2} \hat{\pi}_F^\top D \hat{\pi}_F + d^\top \hat{\pi}_F \quad \text{subject to} \quad \hat{\pi}_F \ge 0, \quad \sum \hat{\pi}_F = 1\]

to estimate the corrected prevalences \(\hat{\pi}_F\) on the test set [3].

This thresholding on posterior probabilities ensures that the matrix \(\mathbf{X}\) highlights regions where the classifier consistently predicts a class more confidently than its baseline prevalence, improving statistical stability and reducing estimation variance [3].

References