APP#
- class mlquantify.evaluation.protocol.APP(models: List[str | Quantifier] | str | Quantifier, batch_size: List[int] | int, learner: BaseEstimator | None = None, n_prevs: int = 100, n_iterations: int = 1, n_jobs: int = 1, random_state: int = 32, verbose: bool = False, return_type: str = 'predictions', measures: List[str] | None = None)[source]#
Artificial Prevalence Protocol.
This approach splits a test into several samples varying prevalence and sample size, with n iterations. For a list of Quantifiers, it computes training and testing for each one and returns either a table of results with error measures or just the predictions.
- Parameters:
- modelsUnion[List[Union[str, Quantifier]], str, Quantifier]
List of quantification models, a single model name, or ‘all’ for all models.
- batch_sizeUnion[List[int], int]
Size of the batches to be processed, or a list of sizes.
- learnerBaseEstimator, optional
Machine learning model to be used with the quantifiers. Required for model methods.
- n_prevsint, optional
Number of prevalence points to generate. Default is 100.
- n_iterationsint, optional
Number of iterations for the protocol. Default is 1.
- n_jobsint, optional
Number of jobs to run in parallel. Default is 1.
- random_stateint, optional
Seed for random number generation. Default is 32.
- verbosebool, optional
Whether to print progress messages. Default is False.
- return_typestr, optional
Type of return value (‘predictions’ or ‘table’). Default is ‘predictions’.
- measuresList[str], optional
List of error measures to calculate. Must be in MEASURES or None. Default is None.
- Attributes:
- modelsList[Quantifier]
List of quantification models.
- batch_sizeUnion[List[int], int]
Size of the batches to be processed.
- learnerBaseEstimator
Machine learning model to be used with the quantifiers.
- n_prevsint
Number of prevalence points to generate.
- n_iterationsint
Number of iterations for the protocol.
- n_jobsint
Number of jobs to run in parallel.
- random_stateint
Seed for random number generation.
- verbosebool
Whether to print progress messages.
- return_typestr
Type of return value (‘predictions’ or ‘table’).
- measuresList[str]
List of error measures to calculate.
- Raises:
- AssertionError
If return_type is invalid.
See also
Examples
>>> from mlquantify.evaluation.protocol import APP >>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.datasets import load_breast_cancer >>> from sklearn.model_selection import train_test_split >>> >>> # Loading dataset from sklearn >>> features, target = load_breast_cancer(return_X_y=True) >>> >>> #Splitting into train and test >>> X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3) >>> >>> app = APP(models=["CC", "EMQ", "DyS"], ... batch_size=[10, 50, 100], ... learner=RandomForestClassifier(), ... n_prevs=100, # Default ... n_jobs=-1, ... return_type="table", ... measures=["ae", "se"], ... verbose=True) >>> >>> app.fit(X_train, y_train) >>> >>> table = app.predict(X_test, y_test) >>> >>> print(table)
- fit(X_train, y_train)[source]#
Fits the models with the training data.
- Parameters:
- X_trainnp.ndarray
Features of the training set.
- y_trainnp.ndarray
Labels of the training set.
- Returns:
- Protocol
Fitted protocol.
- predict(X_test: ndarray, y_test: ndarray) Any [source]#
Predicts the prevalence for the test set.
- Parameters:
- X_testnp.ndarray
Features of the test set.
- y_testnp.ndarray
Labels of the test set.
- Returns:
- Any
Predictions for the test set. Can be a table or a tuple with the quantifier names, real prevalence, and predicted prevalence.
- predict_protocol(X_test: ndarray, y_test: ndarray) Tuple [source]#
Generates several samples with artificial prevalences and sizes. For each model, predicts with this sample, aggregating all together with a pandas dataframe if requested, or else just the predictions.
- Parameters:
- X_testnp.ndarray
Features of the test set.
- y_testnp.ndarray
Labels of the test set.
- Returns:
- Tuple
Tuple containing the (iteration, model name, prev, prev_pred, and batch size).