Protocol#

class mlquantify.evaluation.protocol.Protocol(models: List[str | Quantifier] | str | Quantifier, learner: BaseEstimator | None = None, n_jobs: int = 1, random_state: int = 32, verbose: bool = False, return_type: str = 'predictions', measures: List[str] | None = None, columns: List[str] = ['ITERATION', 'QUANTIFIER', 'REAL_PREVS', 'PRED_PREVS', 'BATCH_SIZE'])[source]#

Base class for evaluation protocols.

Parameters:

modelsUnion[List[Union[str, Quantifier]], str, Quantifier]: List of quantification models, a single model name, or ‘all’ for all models.
learnerBaseEstimator, optional: Machine learning model to be used with the quantifiers. Required for model methods.
n_jobsint, optional: Number of jobs to run in parallel. Default is 1.
random_stateint, optional: Seed for random number generation. Default is 32.
verbosebool, optional: Whether to print progress messages. Default is False.
return_typestr, optional: Type of return value (‘predictions’ or ‘table’). Default is ‘predictions’.
measuresList[str], optional: List of error measures to calculate. Must be in MEASURES or None. Default is None.
columnsList[str], optional: Columns to be included in the table. Default is [‘ITERATION’, ‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’, ‘BATCH_SIZE’].

Attributes:

modelsList[Quantifier]: List of quantification models.
learnerBaseEstimator: Machine learning model to be used with the quantifiers.
n_jobsint: Number of jobs to run in parallel.
random_stateint: Seed for random number generation.
verbosebool: Whether to print progress messages.
return_typestr: Type of return value (‘predictions’ or ‘table’).
measuresList[str]: List of error measures to calculate.
columnsList[str]: Columns to be included in the table.

Raises:

AssertionError: If measures contain invalid error measures. If return_type is invalid. If columns does not contain [‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’].

See also

APP: Artificial Prevalence Protocol.
NPP: Natural Prevalence Protocol.
Quantifier: Base class for quantification methods.

Notes

The ‘models’ parameter can be a list of Quantifiers, a single Quantifier, a list of model names, a single model name, or ‘all’.
If ‘models’ is a list of model names or ‘all’, ‘learner’ must be provided.
The ‘all’ option for ‘models’ will use all quantification models available in the library.
If ‘models’ is a Quantifier or list of Quantifier, ‘learner’ is not required. But the models must be initializated
You can pass your own model by passing a Quantifier object.
Columns must contain [‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’].
If ‘return_type’ is ‘table’, the table will contain the columns specified in ‘columns’ and the error measures in ‘measures’.
For creating your own protocol, you must have the attributes ‘models’, ‘learner’, ‘n_jobs’, ‘random_state’, ‘verbose’, ‘return_type’, ‘measures’, and ‘columns’., but columns can be changed, as long as it contains [‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’].

Examples

import numpy as np >>> from mlquantify.evaluation.protocol import Protocol >>> from mlquantify.utils import get_real_prev >>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.datasets import load_breast_cancer >>> from sklearn.model_selection import train_test_split >>> import time as t >>> >>> class MyProtocol(Protocol): … def __init__(self, … models, … learner, … n_jobs, … random_state, … verbose, … return_type, … measures, … sample_size, … iterations=10): … super().__init__(models, … learner, … n_jobs, … random_state, … verbose, … return_type, … measures, … columns=[‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’, ‘TIME’]) … self.sample_size = sample_size … self.iterations = iterations … … def predict_protocol(self, X_test, y_test): … predictions = [] … … X_sample, y_sample = self._new_sample(X_test, y_test) … … for _ in range(self.iterations): … for model in self.models: … quantifier = model.__class__.__name__ … … real_prev = get_real_prev(y_sample) … … start_time = t.time() … pred_prev = model.predict(X_sample) … end_time = t.time() … time = end_time - start_time … … predictions.append([quantifier, real_prev, pred_prev, time]) … … return predictions … … def _new_sample(self, X_test, y_test): … indexes = np.random.choice(len(X_test), size=self.sample_size, replace=False) … return X_test[indexes], y_test[indexes] >>> >>> >>> features, target = load_breast_cancer(return_X_y=True) >>> >>> X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.5, random_state=42) >>> >>> protocol = MyProtocol(models=[“CC”, “EMQ”, “DyS”], # or [CC(learner), EMQ(learner), DyS(learner)] … learner=RandomForestClassifier(), … n_jobs=1, … random_state=42, … verbose=True, … return_type=”table”, … measures=None, … sample_size=100) >>> >>> protocol.fit(X_train, y_train) >>> table = protocol.predict(X_test, y_test) >>> print(table)

fit(X_train, y_train)[source]#

Fits the models with the training data.

Parameters:

X_trainnp.ndarray: Features of the training set.
y_trainnp.ndarray: Labels of the training set.

Returns:

Protocol: Fitted protocol.

predict(X_test: ndarray, y_test: ndarray) → Any[source]#

Predicts the prevalence for the test set.

Parameters:

X_testnp.ndarray: Features of the test set.
y_testnp.ndarray: Labels of the test set.

Returns:

Any: Predictions for the test set. Can be a table or a tuple with the quantifier names, real prevalence, and predicted prevalence.

abstract predict_protocol(X_test: ndarray, y_test: ndarray) → ndarray[source]#

Abstract method that every protocol must implement

Parameters:

X_testnp.ndarray: Features of the test set.
y_testnp.ndarray: Labels of the test set.

Returns:

np.ndarray: Predictions for the test set. With the same format as the column names attribute.

sout(msg)[source]#: Prints a message if verbose is True.

Protocol#

This Page