Protocol#

class mlquantify.evaluation.protocol.Protocol(models: List[str | Quantifier] | str | Quantifier, learner: BaseEstimator | None = None, n_jobs: int = 1, random_state: int = 32, verbose: bool = False, return_type: str = 'predictions', measures: List[str] | None = None, columns: List[str] = ['ITERATION', 'QUANTIFIER', 'REAL_PREVS', 'PRED_PREVS', 'BATCH_SIZE'])[source]#

Base class for evaluation protocols.

Parameters:
modelsUnion[List[Union[str, Quantifier]], str, Quantifier]

List of quantification models, a single model name, or ‘all’ for all models.

learnerBaseEstimator, optional

Machine learning model to be used with the quantifiers. Required for model methods.

n_jobsint, optional

Number of jobs to run in parallel. Default is 1.

random_stateint, optional

Seed for random number generation. Default is 32.

verbosebool, optional

Whether to print progress messages. Default is False.

return_typestr, optional

Type of return value (‘predictions’ or ‘table’). Default is ‘predictions’.

measuresList[str], optional

List of error measures to calculate. Must be in MEASURES or None. Default is None.

columnsList[str], optional

Columns to be included in the table. Default is [‘ITERATION’, ‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’, ‘BATCH_SIZE’].

Attributes:
modelsList[Quantifier]

List of quantification models.

learnerBaseEstimator

Machine learning model to be used with the quantifiers.

n_jobsint

Number of jobs to run in parallel.

random_stateint

Seed for random number generation.

verbosebool

Whether to print progress messages.

return_typestr

Type of return value (‘predictions’ or ‘table’).

measuresList[str]

List of error measures to calculate.

columnsList[str]

Columns to be included in the table.

Raises:
AssertionError

If measures contain invalid error measures. If return_type is invalid. If columns does not contain [‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’].

See also

APP

Artificial Prevalence Protocol.

NPP

Natural Prevalence Protocol.

Quantifier

Base class for quantification methods.

Notes

  • The ‘models’ parameter can be a list of Quantifiers, a single Quantifier, a list of model names, a single model name, or ‘all’.

  • If ‘models’ is a list of model names or ‘all’, ‘learner’ must be provided.

  • The ‘all’ option for ‘models’ will use all quantification models available in the library.

  • If ‘models’ is a Quantifier or list of Quantifier, ‘learner’ is not required. But the models must be initializated

  • You can pass your own model by passing a Quantifier object.

  • Columns must contain [‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’].

  • If ‘return_type’ is ‘table’, the table will contain the columns specified in ‘columns’ and the error measures in ‘measures’.

  • For creating your own protocol, you must have the attributes ‘models’, ‘learner’, ‘n_jobs’, ‘random_state’, ‘verbose’, ‘return_type’, ‘measures’, and ‘columns’., but columns can be changed, as long as it contains [‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’].

Examples

import numpy as np >>> from mlquantify.evaluation.protocol import Protocol >>> from mlquantify.utils import get_real_prev >>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.datasets import load_breast_cancer >>> from sklearn.model_selection import train_test_split >>> import time as t >>> >>> class MyProtocol(Protocol): … def __init__(self, … models, … learner, … n_jobs, … random_state, … verbose, … return_type, … measures, … sample_size, … iterations=10): … super().__init__(models, … learner, … n_jobs, … random_state, … verbose, … return_type, … measures, … columns=[‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’, ‘TIME’]) … self.sample_size = sample_size … self.iterations = iterations … … def predict_protocol(self, X_test, y_test): … predictions = [] … … X_sample, y_sample = self._new_sample(X_test, y_test) … … for _ in range(self.iterations): … for model in self.models: … quantifier = model.__class__.__name__ … … real_prev = get_real_prev(y_sample) … … start_time = t.time() … pred_prev = model.predict(X_sample) … end_time = t.time() … time = end_time - start_time … … predictions.append([quantifier, real_prev, pred_prev, time]) … … return predictions … … def _new_sample(self, X_test, y_test): … indexes = np.random.choice(len(X_test), size=self.sample_size, replace=False) … return X_test[indexes], y_test[indexes] >>> >>> >>> features, target = load_breast_cancer(return_X_y=True) >>> >>> X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.5, random_state=42) >>> >>> protocol = MyProtocol(models=[“CC”, “EMQ”, “DyS”], # or [CC(learner), EMQ(learner), DyS(learner)] … learner=RandomForestClassifier(), … n_jobs=1, … random_state=42, … verbose=True, … return_type=”table”, … measures=None, … sample_size=100) >>> >>> protocol.fit(X_train, y_train) >>> table = protocol.predict(X_test, y_test) >>> print(table)

fit(X_train, y_train)[source]#

Fits the models with the training data.

Parameters:
X_trainnp.ndarray

Features of the training set.

y_trainnp.ndarray

Labels of the training set.

Returns:
Protocol

Fitted protocol.

predict(X_test: ndarray, y_test: ndarray) Any[source]#

Predicts the prevalence for the test set.

Parameters:
X_testnp.ndarray

Features of the test set.

y_testnp.ndarray

Labels of the test set.

Returns:
Any

Predictions for the test set. Can be a table or a tuple with the quantifier names, real prevalence, and predicted prevalence.

abstract predict_protocol(X_test: ndarray, y_test: ndarray) ndarray[source]#

Abstract method that every protocol must implement

Parameters:
X_testnp.ndarray

Features of the test set.

y_testnp.ndarray

Labels of the test set.

Returns:
np.ndarray

Predictions for the test set. With the same format as the column names attribute.

sout(msg)[source]#

Prints a message if verbose is True.