Protocol#
- class mlquantify.evaluation.protocol.Protocol(models: List[str | Quantifier] | str | Quantifier, learner: BaseEstimator | None = None, n_jobs: int = 1, random_state: int = 32, verbose: bool = False, return_type: str = 'predictions', measures: List[str] | None = None, columns: List[str] = ['ITERATION', 'QUANTIFIER', 'REAL_PREVS', 'PRED_PREVS', 'BATCH_SIZE'])[source]#
Base class for evaluation protocols.
- Parameters:
- modelsUnion[List[Union[str, Quantifier]], str, Quantifier]
List of quantification models, a single model name, or ‘all’ for all models.
- learnerBaseEstimator, optional
Machine learning model to be used with the quantifiers. Required for model methods.
- n_jobsint, optional
Number of jobs to run in parallel. Default is 1.
- random_stateint, optional
Seed for random number generation. Default is 32.
- verbosebool, optional
Whether to print progress messages. Default is False.
- return_typestr, optional
Type of return value (‘predictions’ or ‘table’). Default is ‘predictions’.
- measuresList[str], optional
List of error measures to calculate. Must be in MEASURES or None. Default is None.
- columnsList[str], optional
Columns to be included in the table. Default is [‘ITERATION’, ‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’, ‘BATCH_SIZE’].
- Attributes:
- modelsList[Quantifier]
List of quantification models.
- learnerBaseEstimator
Machine learning model to be used with the quantifiers.
- n_jobsint
Number of jobs to run in parallel.
- random_stateint
Seed for random number generation.
- verbosebool
Whether to print progress messages.
- return_typestr
Type of return value (‘predictions’ or ‘table’).
- measuresList[str]
List of error measures to calculate.
- columnsList[str]
Columns to be included in the table.
- Raises:
- AssertionError
If measures contain invalid error measures. If return_type is invalid. If columns does not contain [‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’].
See also
Notes
The ‘models’ parameter can be a list of Quantifiers, a single Quantifier, a list of model names, a single model name, or ‘all’.
If ‘models’ is a list of model names or ‘all’, ‘learner’ must be provided.
The ‘all’ option for ‘models’ will use all quantification models available in the library.
If ‘models’ is a Quantifier or list of Quantifier, ‘learner’ is not required. But the models must be initializated
You can pass your own model by passing a Quantifier object.
Columns must contain [‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’].
If ‘return_type’ is ‘table’, the table will contain the columns specified in ‘columns’ and the error measures in ‘measures’.
For creating your own protocol, you must have the attributes ‘models’, ‘learner’, ‘n_jobs’, ‘random_state’, ‘verbose’, ‘return_type’, ‘measures’, and ‘columns’., but columns can be changed, as long as it contains [‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’].
Examples
import numpy as np >>> from mlquantify.evaluation.protocol import Protocol >>> from mlquantify.utils import get_real_prev >>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.datasets import load_breast_cancer >>> from sklearn.model_selection import train_test_split >>> import time as t >>> >>> class MyProtocol(Protocol): … def __init__(self, … models, … learner, … n_jobs, … random_state, … verbose, … return_type, … measures, … sample_size, … iterations=10): … super().__init__(models, … learner, … n_jobs, … random_state, … verbose, … return_type, … measures, … columns=[‘QUANTIFIER’, ‘REAL_PREVS’, ‘PRED_PREVS’, ‘TIME’]) … self.sample_size = sample_size … self.iterations = iterations … … def predict_protocol(self, X_test, y_test): … predictions = [] … … X_sample, y_sample = self._new_sample(X_test, y_test) … … for _ in range(self.iterations): … for model in self.models: … quantifier = model.__class__.__name__ … … real_prev = get_real_prev(y_sample) … … start_time = t.time() … pred_prev = model.predict(X_sample) … end_time = t.time() … time = end_time - start_time … … predictions.append([quantifier, real_prev, pred_prev, time]) … … return predictions … … def _new_sample(self, X_test, y_test): … indexes = np.random.choice(len(X_test), size=self.sample_size, replace=False) … return X_test[indexes], y_test[indexes] >>> >>> >>> features, target = load_breast_cancer(return_X_y=True) >>> >>> X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.5, random_state=42) >>> >>> protocol = MyProtocol(models=[“CC”, “EMQ”, “DyS”], # or [CC(learner), EMQ(learner), DyS(learner)] … learner=RandomForestClassifier(), … n_jobs=1, … random_state=42, … verbose=True, … return_type=”table”, … measures=None, … sample_size=100) >>> >>> protocol.fit(X_train, y_train) >>> table = protocol.predict(X_test, y_test) >>> print(table)
- fit(X_train, y_train)[source]#
Fits the models with the training data.
- Parameters:
- X_trainnp.ndarray
Features of the training set.
- y_trainnp.ndarray
Labels of the training set.
- Returns:
- Protocol
Fitted protocol.
- predict(X_test: ndarray, y_test: ndarray) Any [source]#
Predicts the prevalence for the test set.
- Parameters:
- X_testnp.ndarray
Features of the test set.
- y_testnp.ndarray
Labels of the test set.
- Returns:
- Any
Predictions for the test set. Can be a table or a tuple with the quantifier names, real prevalence, and predicted prevalence.
- abstract predict_protocol(X_test: ndarray, y_test: ndarray) ndarray [source]#
Abstract method that every protocol must implement
- Parameters:
- X_testnp.ndarray
Features of the test set.
- y_testnp.ndarray
Labels of the test set.
- Returns:
- np.ndarray
Predictions for the test set. With the same format as the column names attribute.