GridSearchQ#

class mlquantify.model_selection.GridSearchQ(model: Quantifier, param_grid: dict, protocol: str = 'app', n_prevs: int = 100, n_repetitions: int = 1, scoring: List[str] | str = 'ae', refit: bool = True, val_split: float = 0.4, n_jobs: int = 1, random_seed: int = 42, timeout: int = -1, verbose: bool = False)[source]#

Hyperparameter optimization for quantification models using grid search.

GridSearchQ allows hyperparameter tuning for quantification models by minimizing a quantification-oriented loss over a parameter grid. This method evaluates hyperparameter configurations using quantification metrics rather than standard classification metrics, ensuring better approximation of class distributions.

Parameters:
modelQuantifier

The base quantification model to optimize.

param_griddict

Dictionary where keys are parameter names (str) and values are lists of parameter settings to try.

protocolstr, default=’app’

The quantification protocol to use. Supported options are: - ‘app’: Artificial Prevalence Protocol. - ‘npp’: Natural Prevalence Protocol.

n_prevsint, default=None

Number of prevalence points to generate for APP.

n_repetitionsint, default=1

Number of repetitions to perform for NPP.

scoringUnion[List[str], str], default=’mae’

Metric or metrics to evaluate the model’s performance. Can be a string (e.g., ‘mae’) or a list of metrics.

refitbool, default=True

If True, refit the model using the best found hyperparameters on the entire dataset.

val_splitfloat, default=0.4

Proportion of the training data to use for validation. Only applicable if cross-validation is not used.

n_jobsint, default=1

The number of jobs to run in parallel. -1 means using all processors.

random_seedint, default=42

Random seed for reproducibility.

timeoutint, default=-1

Maximum time (in seconds) allowed for a single parameter combination. A value of -1 disables the timeout.

verbosebool, default=False

If True, print progress messages during grid search.

Attributes:
best_paramsdict

The parameter setting that gave the best results on the validation set.

best_scorefloat

The best score achieved during the grid search.

resultspandas.DataFrame

A DataFrame containing details of all evaluations, including parameters, scores, and execution times.

References

The idea of using grid search for hyperparameter optimization in quantification models was discussed in: Moreo, Alejandro; Sebastiani, Fabrizio. “Re-assessing the ‘Classify and Count’ Quantification Method”. In: Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part II. Springer International Publishing, 2021, pp. 75–91. [Link](https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6).

Examples

>>> from mlquantify.methods.aggregative import DyS
>>> from mlquantify.model_selection import GridSearchQ
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.model_selection import train_test_split
>>> 
>>> # Loading dataset from sklearn
>>> features, target = load_breast_cancer(return_X_y=True)
>>> 
>>> # Splitting into train and test
>>> X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)
>>> 
>>> model = DyS(RandomForestClassifier())
>>> 
>>> # Creating the hyperparameter grid
>>> param_grid = {
>>>     'learner__n_estimators': [100, 500, 1000],
>>>     'learner__criterion': ["gini", "entropy"],
>>>     'measure': ["topsoe", "hellinger"]
>>> }
>>> 
>>> gs = GridSearchQ(
...                 model=model,
...                 param_grid=param_grid,
...                 protocol='app', # Default
...                 n_prevs=100,    # Default
...                 scoring='nae',
...                 refit=True,     # Default
...                 val_split=0.3,
...                 n_jobs=-1,
...                 verbose=True)
>>> 
>>> gs.fit(X_train, y_train)
[GridSearchQ]: Optimization complete. Best score: 0.0060630241297973545, with parameters: {'learner__n_estimators': 500, 'learner__criterion': 'entropy', 'measure': 'topsoe'}.
>>> predictions = gs.predict(X_test)
>>> predictions
{0: 0.4182508973311534, 1: 0.5817491026688466}
best_model()[source]#

Return the best model after fitting.

Returns:
Quantifier

The best fitted model.

Raises:
ValueError

If called before fitting.

property classes_[source]#

Get the classes of the best model.

Returns:
array-like

The classes learned by the best model.

fit(X, y)[source]#

Fit the quantifier model and perform grid search.

Parameters:
Xarray-like of shape (n_samples, n_features)

Training features, where n_samples is the number of samples and n_features is the number of features.

yarray-like of shape (n_samples,)

Training labels.

Returns:
selfGridSearchQ

Returns the fitted instance of GridSearchQ.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get the parameters of the best model.

Parameters:
deepbool, optional, default=True

If True, will return the parameters for this estimator and contained subobjects.

Returns:
dict

Parameters of the best model.

Raises:
ValueError

If called before the model has been fitted.

predict(X)[source]#

Make predictions using the best found model.

Parameters:
Xarray-like of shape (n_samples, n_features)

Data to predict on.

Returns:
array-like

Predictions for the input data.

Raises:
RuntimeError

If the model has not been fitted yet.

set_params(**parameters)[source]#

Set the hyperparameters for grid search.

Parameters:
parametersdict

Dictionary of hyperparameters to set.

sout(msg)[source]#

Prints messages if verbose is True.