6.4. Building a Protocol#
The construction of all protocols is similar, and you can create your own protocol by inheriting from the Protocol
class. The only requirement is to implement:
_new_sample
method: This method is responsible for generating the new sample to be used in the protocol. It should receive the X_test and y_test parameters, and return the new sample to be used in the protocol. The new sample should be a tuple of (X, y), where X is the new sample and y is the corresponding labels.predict_protocol
method: This method is responsible for predicting the quantifier. It should receive the X_test and y_test parameters, and return the predictions of the quantifier. Containing at least the names of the quantifiers, the real prevalences and the predicted prevalences for each samples. You can use the_new_sample
function to generate the samples_delayed_fit
method: This method is responsible for fitting the quantifier. It should receive the model (quantifier), X_train and y_train parameters, and return the fitted quantifier.
An example of a custom protocol is:
import numpy as np
from mlquantify.evaluation.protocol import Protocol
from mlquantify.utils import get_real_prev
from sklearn.ensemble import RandomForestClassifier
import time as t
class MyProtocol(Protocol):
def __init__(self,
models,
n_jobs,
random_state,
verbose,
return_type,
measures,
sample_size,
iterations=10,
learner=RandomForestClassifier()):
super().__init__(models, learner, n_jobs, random_state, verbose, return_type, measures,
columns=['QUANTIFIER', 'REAL_PREVS', 'PRED_PREVS', 'TIME'])
# Specific Parameters
self.sample_size = sample_size
self.iterations = iterations
def predict_protocol(self, X_test, y_test):
predictions = []
X_sample, y_sample = self._new_sample(X_test, y_test)
for _ in range(self.iterations):
for model in self.models:
quantifier = model.__class__.__name__
real_prev = get_real_prev(y_sample)
start_time = t.time()
pred_prev = model.predict(X_sample)
end_time = t.time()
time_elapsed = end_time - start_time
predictions.append([quantifier, real_prev, pred_prev, time_elapsed])
return predictions
def _new_sample(self, X_test, y_test):
indexes = np.random.choice(len(X_test), size=self.sample_size, replace=False)
return X_test[indexes], y_test[indexes]
An important note is that this examples uses the get_real_prev
function from the mlquantify.utils
module to get the real prevalences of the sample. This function is used to calculate the real prevalences of the sample, and it is important to use it to ensure that the real prevalences are calculated correctly.
Other point is that you need to pass the parameters of Protocol
class to the constructor of the Protocol class, and the parameters of your custom protocol to the constructor of your custom protocol. The parameters of the Protocol class can be found in the Protocol section.
Warning
The custom protocol must pass the columns parameter with at least the QUANTIFIER, REAL_PREVS and PRED_PREVS columns, otherwise the protocol will not work. The rest of the columns are optional.