6.4. Building a Protocol#

The construction of all protocols is similar, and you can create your own protocol by inheriting from the Protocol class. The only requirement is to implement:

  • _new_sample method: This method is responsible for generating the new sample to be used in the protocol. It should receive the X_test and y_test parameters, and return the new sample to be used in the protocol. The new sample should be a tuple of (X, y), where X is the new sample and y is the corresponding labels.

  • predict_protocol method: This method is responsible for predicting the quantifier. It should receive the X_test and y_test parameters, and return the predictions of the quantifier. Containing at least the names of the quantifiers, the real prevalences and the predicted prevalences for each samples. You can use the _new_sample function to generate the samples

  • _delayed_fit method: This method is responsible for fitting the quantifier. It should receive the model (quantifier), X_train and y_train parameters, and return the fitted quantifier.

An example of a custom protocol is:

import numpy as np
from mlquantify.evaluation.protocol import Protocol
from mlquantify.utils import get_real_prev
from sklearn.ensemble import RandomForestClassifier
import time as t

class MyProtocol(Protocol):
    def __init__(self,
                models,
                n_jobs,
                random_state,
                verbose,
                return_type,
                measures,
                sample_size,
                iterations=10,
                learner=RandomForestClassifier()):

        super().__init__(models, learner, n_jobs, random_state, verbose, return_type, measures,
                        columns=['QUANTIFIER', 'REAL_PREVS', 'PRED_PREVS', 'TIME'])

        # Specific Parameters
        self.sample_size = sample_size
        self.iterations = iterations

    def predict_protocol(self, X_test, y_test):
        predictions = []
        X_sample, y_sample = self._new_sample(X_test, y_test)

        for _ in range(self.iterations):
            for model in self.models:
                quantifier = model.__class__.__name__
                real_prev = get_real_prev(y_sample)

                start_time = t.time()
                pred_prev = model.predict(X_sample)
                end_time = t.time()
                time_elapsed = end_time - start_time

                predictions.append([quantifier, real_prev, pred_prev, time_elapsed])

        return predictions

    def _new_sample(self, X_test, y_test):
        indexes = np.random.choice(len(X_test), size=self.sample_size, replace=False)
        return X_test[indexes], y_test[indexes]

An important note is that this examples uses the get_real_prev function from the mlquantify.utils module to get the real prevalences of the sample. This function is used to calculate the real prevalences of the sample, and it is important to use it to ensure that the real prevalences are calculated correctly.

Other point is that you need to pass the parameters of Protocol class to the constructor of the Protocol class, and the parameters of your custom protocol to the constructor of your custom protocol. The parameters of the Protocol class can be found in the Protocol section.

Warning

The custom protocol must pass the columns parameter with at least the QUANTIFIER, REAL_PREVS and PRED_PREVS columns, otherwise the protocol will not work. The rest of the columns are optional.