GridSearchQ#

class mlquantify.model_selection.GridSearchQ(model: Quantifier, param_grid: dict, protocol: str = 'app', n_prevs: int = 100, n_repetitions: int = 1, scoring: List[str] | str = 'ae', refit: bool = True, val_split: float = 0.4, n_jobs: int = 1, random_seed: int = 42, timeout: int = -1, verbose: bool = False)[source]#

Hyperparameter optimization for quantification models using grid search.

GridSearchQ allows hyperparameter tuning for quantification models by minimizing a quantification-oriented loss over a parameter grid. This method evaluates hyperparameter configurations using quantification metrics rather than standard classification metrics, ensuring better approximation of class distributions.

Parameters:

modelQuantifier: The base quantification model to optimize.
param_griddict: Dictionary where keys are parameter names (str) and values are lists of parameter settings to try.
protocolstr, default=’app’: The quantification protocol to use. Supported options are: - ‘app’: Artificial Prevalence Protocol. - ‘npp’: Natural Prevalence Protocol.
n_prevsint, default=None: Number of prevalence points to generate for APP.
n_repetitionsint, default=1: Number of repetitions to perform for NPP.
scoringUnion[List[str], str], default=’mae’: Metric or metrics to evaluate the model’s performance. Can be a string (e.g., ‘mae’) or a list of metrics.
refitbool, default=True: If True, refit the model using the best found hyperparameters on the entire dataset.
val_splitfloat, default=0.4: Proportion of the training data to use for validation. Only applicable if cross-validation is not used.
n_jobsint, default=1: The number of jobs to run in parallel. -1 means using all processors.
random_seedint, default=42: Random seed for reproducibility.
timeoutint, default=-1: Maximum time (in seconds) allowed for a single parameter combination. A value of -1 disables the timeout.
verbosebool, default=False: If True, print progress messages during grid search.

Attributes:

best_paramsdict: The parameter setting that gave the best results on the validation set.
best_scorefloat: The best score achieved during the grid search.
resultspandas.DataFrame: A DataFrame containing details of all evaluations, including parameters, scores, and execution times.

References

The idea of using grid search for hyperparameter optimization in quantification models was discussed in: Moreo, Alejandro; Sebastiani, Fabrizio. “Re-assessing the ‘Classify and Count’ Quantification Method”. In: Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part II. Springer International Publishing, 2021, pp. 75–91. [Link](https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6).

Examples

>>> from mlquantify.methods.aggregative import DyS
>>> from mlquantify.model_selection import GridSearchQ
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.model_selection import train_test_split
>>> 
>>> # Loading dataset from sklearn
>>> features, target = load_breast_cancer(return_X_y=True)
>>> 
>>> # Splitting into train and test
>>> X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)
>>> 
>>> model = DyS(RandomForestClassifier())
>>> 
>>> # Creating the hyperparameter grid
>>> param_grid = {
>>>     'learner__n_estimators': [100, 500, 1000],
>>>     'learner__criterion': ["gini", "entropy"],
>>>     'measure': ["topsoe", "hellinger"]
>>> }
>>> 
>>> gs = GridSearchQ(
...                 model=model,
...                 param_grid=param_grid,
...                 protocol='app', # Default
...                 n_prevs=100,    # Default
...                 scoring='nae',
...                 refit=True,     # Default
...                 val_split=0.3,
...                 n_jobs=-1,
...                 verbose=True)
>>> 
>>> gs.fit(X_train, y_train)
[GridSearchQ]: Optimization complete. Best score: 0.0060630241297973545, with parameters: {'learner__n_estimators': 500, 'learner__criterion': 'entropy', 'measure': 'topsoe'}.
>>> predictions = gs.predict(X_test)
>>> predictions
{0: 0.4182508973311534, 1: 0.5817491026688466}

best_model()[source]#

Return the best model after fitting.

Returns:

Quantifier: The best fitted model.

Raises:

ValueError: If called before fitting.

property classes_[source]#

Get the classes of the best model.

Returns:

array-like: The classes learned by the best model.

fit(X, y)[source]#

Fit the quantifier model and perform grid search.

Parameters:

Xarray-like of shape (n_samples, n_features): Training features, where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): Training labels.

Returns:

selfGridSearchQ: Returns the fitted instance of GridSearchQ.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get the parameters of the best model.

Parameters:

deepbool, optional, default=True: If True, will return the parameters for this estimator and contained subobjects.

Returns:

dict: Parameters of the best model.

Raises:

ValueError: If called before the model has been fitted.

predict(X)[source]#

Make predictions using the best found model.

Parameters:

Xarray-like of shape (n_samples, n_features): Data to predict on.

Returns:

array-like: Predictions for the input data.

Raises:

RuntimeError: If the model has not been fitted yet.

set_params(**parameters)[source]#

Set the hyperparameters for grid search.

Parameters:

parametersdict: Dictionary of hyperparameters to set.

sout(msg)[source]#: Prints messages if verbose is True.

GridSearchQ#

This Page