2.1. Mixture Models for Non-Aggregative Quantification#

Currently, the only Mixture Model method specifically designed for non-aggregative quantification is HDx (Hellinger Distance x-Similarity), found at HDx.

2.1.1. HDx: Hellinger Distance x-Similarity#

HDx is a non-aggregative quantification method based on HDy [1]. While HDy operates on posterior probabilities (y-space), HDx works directly in the feature space (x-space), without aggregating predictions.

The goal of HDx is to estimate the prevalence parameter (alpha) that minimizes the average Hellinger Distance between the empirical feature distribution of the test set and a convex mixture of the class-conditional feature distributions from training data.

HDx, different from HDy, does not require a learner to estimate posterior probabilities, as it operates directly in the feature space, so it does not have a aggregate method.

from mlquantify.mixture import HDx
from sklearn.ensemble import RandomForestClassifier

q = HDx(bins_size=[10, 20, 30])
q.fit(X_train, y_train)
q.predict(X_test)