APP#

class mlquantify.evaluation.protocol.APP(models: List[str | Quantifier] | str | Quantifier, batch_size: List[int] | int, learner: BaseEstimator | None = None, n_prevs: int = 100, n_iterations: int = 1, n_jobs: int = 1, random_state: int = 32, verbose: bool = False, return_type: str = 'predictions', measures: List[str] | None = None)[source]#

Artificial Prevalence Protocol.

This approach splits a test into several samples varying prevalence and sample size, with n iterations. For a list of Quantifiers, it computes training and testing for each one and returns either a table of results with error measures or just the predictions.

Parameters:

modelsUnion[List[Union[str, Quantifier]], str, Quantifier]: List of quantification models, a single model name, or ‘all’ for all models.
batch_sizeUnion[List[int], int]: Size of the batches to be processed, or a list of sizes.
learnerBaseEstimator, optional: Machine learning model to be used with the quantifiers. Required for model methods.
n_prevsint, optional: Number of prevalence points to generate. Default is 100.
n_iterationsint, optional: Number of iterations for the protocol. Default is 1.
n_jobsint, optional: Number of jobs to run in parallel. Default is 1.
random_stateint, optional: Seed for random number generation. Default is 32.
verbosebool, optional: Whether to print progress messages. Default is False.
return_typestr, optional: Type of return value (‘predictions’ or ‘table’). Default is ‘predictions’.
measuresList[str], optional: List of error measures to calculate. Must be in MEASURES or None. Default is None.

Attributes:

modelsList[Quantifier]: List of quantification models.
batch_sizeUnion[List[int], int]: Size of the batches to be processed.
learnerBaseEstimator: Machine learning model to be used with the quantifiers.
n_prevsint: Number of prevalence points to generate.
n_iterationsint: Number of iterations for the protocol.
n_jobsint: Number of jobs to run in parallel.
random_stateint: Seed for random number generation.
verbosebool: Whether to print progress messages.
return_typestr: Type of return value (‘predictions’ or ‘table’).
measuresList[str]: List of error measures to calculate.

Raises:

AssertionError: If return_type is invalid.

See also

Protocol: Base class for evaluation protocols.
NPP: Natural Prevalence Protocol.
Quantifier: Base class for quantification methods.

Examples

>>> from mlquantify.evaluation.protocol import APP
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.model_selection import train_test_split
>>>
>>> # Loading dataset from sklearn
>>> features, target = load_breast_cancer(return_X_y=True)
>>> 
>>> #Splitting into train and test
>>> X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)
>>>
>>> app = APP(models=["CC", "EMQ", "DyS"],
...           batch_size=[10, 50, 100],
...           learner=RandomForestClassifier(),
...           n_prevs=100, # Default
...           n_jobs=-1,
...           return_type="table",
...           measures=["ae", "se"],
...           verbose=True)
>>>
>>> app.fit(X_train, y_train)
>>>
>>> table = app.predict(X_test, y_test)
>>>
>>> print(table)

fit(X_train, y_train)[source]#

Fits the models with the training data.

Parameters:

X_trainnp.ndarray: Features of the training set.
y_trainnp.ndarray: Labels of the training set.

Returns:

Protocol: Fitted protocol.

predict(X_test: ndarray, y_test: ndarray) → Any[source]#

Predicts the prevalence for the test set.

Parameters:

X_testnp.ndarray: Features of the test set.
y_testnp.ndarray: Labels of the test set.

Returns:

Any: Predictions for the test set. Can be a table or a tuple with the quantifier names, real prevalence, and predicted prevalence.

predict_protocol(X_test: ndarray, y_test: ndarray) → Tuple[source]#

Generates several samples with artificial prevalences and sizes. For each model, predicts with this sample, aggregating all together with a pandas dataframe if requested, or else just the predictions.

Parameters:

X_testnp.ndarray: Features of the test set.
y_testnp.ndarray: Labels of the test set.

Returns:

Tuple: Tuple containing the (iteration, model name, prev, prev_pred, and batch size).

sout(msg)[source]#: Prints a message if verbose is True.

APP#

This Page