.. _neural_quantifiers: .. currentmodule:: mlquantify.neural ================== Neural Quantifiers ================== Neural quantifiers learn a direct mapping from a *bag of instances* to a prevalence vector, without relying on a hand-crafted aggregation formula. They are trained end-to-end to minimise a quantification loss and can exploit deep feature representations that are inaccessible to analytical methods. .. admonition:: PyTorch required Neural quantifiers depend on ``torch``. Install it with:: pip install torch .. contents:: Contents :local: :depth: 2 ---- QuaNet — Quantification Network ================================= :class:`QuaNet` (Esuli et al., 2018) is a recurrent neural network that reads a **set** of instance embeddings produced by a base classifier and predicts the prevalence vector for that set. **Architecture:** 1. The base classifier (``estimator``) produces a fixed-size embedding for each test instance (via its ``transform`` or ``predict_proba`` output). 2. An LSTM reads the sequence of embeddings (in random order) to produce a context vector summarising the set. 3. The context vector is concatenated with *auxiliary quantification statistics* (CC, PCC, and ACC estimates) computed on the current batch. 4. A feed-forward head maps the concatenated vector to a prevalence vector with a softmax output. **Why it exists:** QuaNet learns to exploit patterns in instance embeddings that rule-based aggregation methods cannot capture. On large text datasets where embeddings carry rich distributional information, it has shown competitive or superior performance to DyS and EMQ. Parameters ---------- .. list-table:: :widths: 22 15 63 :header-rows: 1 * - Parameter - Default - Explanation * - ``estimator`` - required - A classifier that (a) produces posterior probabilities via ``predict_proba`` and (b) optionally exposes a ``transform`` method for dense embeddings. The predictions are used as LSTM inputs. * - ``device`` - ``'cpu'`` - PyTorch device. Set to ``'cuda'`` to use a GPU if available. Training is significantly faster on GPU for large datasets. * - ``hidden_size`` - ``64`` - Size of the LSTM hidden state. Larger values give more capacity but require more data. Try 32, 64, 128 depending on dataset size. * - ``n_hidden_layers`` - ``1`` - Number of LSTM layers. More layers capture longer-range dependencies in the embedding sequence but are slower to train. * - ``lstm_hidden_size`` - ``32`` - Hidden size per LSTM layer. * - ``drop_p`` - ``0.5`` - Dropout probability in the feed-forward head. Reduce to 0.2–0.3 if training data is large; increase to 0.6–0.7 to combat overfitting on small datasets. * - ``batch_size`` - ``64`` - Number of instances per training mini-batch. Larger batches are faster on GPU; smaller batches provide more gradient-update steps per epoch. * - ``max_epoch`` - ``100`` - Maximum training epochs. Early stopping kicks in if validation loss stops improving. * - ``patience`` - ``10`` - Early-stopping patience (epochs without improvement before stopping). * - ``lr`` - ``1e-3`` - Adam learning rate. Reduce to ``1e-4`` if training is unstable. * - ``val_split`` - ``0.3`` - Fraction of training data held out as validation (for early stopping). Examples -------- .. code-block:: python # Requires PyTorch from mlquantify.neural import QuaNet from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split X, y = make_classification(n_samples=2000, n_features=20, weights=[0.7, 0.3], random_state=42) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42) # QuaNet uses the classifier's predict_proba as embedding q = QuaNet( estimator=LogisticRegression(), device='cpu', hidden_size=64, max_epoch=50, patience=5, ) q.fit(X_train, y_train) print(q.predict(X_test)) .. note:: QuaNet requires the estimator to be **pre-fitted** before ``QuaNet.fit`` if you pass ``estimator_fitted=True``, or it will fit the estimator internally as part of the training pipeline. When to Use QuaNet ------------------- - **Large text datasets** where the base classifier produces rich embeddings (e.g. transformer-based models with ``transform``). - **When EMQ / DyS plateau** and you have enough data and computation to train end-to-end. - **Not recommended** for small datasets (< 1,000 instances) or when computation is constrained — analytical methods (EMQ, DyS) will be faster and likely more accurate. .. seealso:: :ref:`likelihood` for EMQ, which is faster and often competitive. :ref:`distribution_matching` for DyS / KDEyML.