GridSearchQ#
- class mlquantify.model_selection.GridSearchQ(model: Quantifier, param_grid: dict, protocol: str = 'app', n_prevs: int = 100, n_repetitions: int = 1, scoring: List[str] | str = 'ae', refit: bool = True, val_split: float = 0.4, n_jobs: int = 1, random_seed: int = 42, timeout: int = -1, verbose: bool = False)[source]#
Hyperparameter optimization for quantification models using grid search.
GridSearchQ allows hyperparameter tuning for quantification models by minimizing a quantification-oriented loss over a parameter grid. This method evaluates hyperparameter configurations using quantification metrics rather than standard classification metrics, ensuring better approximation of class distributions.
- Parameters:
- modelQuantifier
The base quantification model to optimize.
- param_griddict
Dictionary where keys are parameter names (str) and values are lists of parameter settings to try.
- protocolstr, default=’app’
The quantification protocol to use. Supported options are: - ‘app’: Artificial Prevalence Protocol. - ‘npp’: Natural Prevalence Protocol.
- n_prevsint, default=None
Number of prevalence points to generate for APP.
- n_repetitionsint, default=1
Number of repetitions to perform for NPP.
- scoringUnion[List[str], str], default=’mae’
Metric or metrics to evaluate the model’s performance. Can be a string (e.g., ‘mae’) or a list of metrics.
- refitbool, default=True
If True, refit the model using the best found hyperparameters on the entire dataset.
- val_splitfloat, default=0.4
Proportion of the training data to use for validation. Only applicable if cross-validation is not used.
- n_jobsint, default=1
The number of jobs to run in parallel. -1 means using all processors.
- random_seedint, default=42
Random seed for reproducibility.
- timeoutint, default=-1
Maximum time (in seconds) allowed for a single parameter combination. A value of -1 disables the timeout.
- verbosebool, default=False
If True, print progress messages during grid search.
- Attributes:
- best_paramsdict
The parameter setting that gave the best results on the validation set.
- best_scorefloat
The best score achieved during the grid search.
- resultspandas.DataFrame
A DataFrame containing details of all evaluations, including parameters, scores, and execution times.
References
The idea of using grid search for hyperparameter optimization in quantification models was discussed in: Moreo, Alejandro; Sebastiani, Fabrizio. “Re-assessing the ‘Classify and Count’ Quantification Method”. In: Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part II. Springer International Publishing, 2021, pp. 75–91. [Link](https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6).
Examples
>>> from mlquantify.methods.aggregative import DyS >>> from mlquantify.model_selection import GridSearchQ >>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.datasets import load_breast_cancer >>> from sklearn.model_selection import train_test_split >>> >>> # Loading dataset from sklearn >>> features, target = load_breast_cancer(return_X_y=True) >>> >>> # Splitting into train and test >>> X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3) >>> >>> model = DyS(RandomForestClassifier()) >>> >>> # Creating the hyperparameter grid >>> param_grid = { >>> 'learner__n_estimators': [100, 500, 1000], >>> 'learner__criterion': ["gini", "entropy"], >>> 'measure': ["topsoe", "hellinger"] >>> } >>> >>> gs = GridSearchQ( ... model=model, ... param_grid=param_grid, ... protocol='app', # Default ... n_prevs=100, # Default ... scoring='nae', ... refit=True, # Default ... val_split=0.3, ... n_jobs=-1, ... verbose=True) >>> >>> gs.fit(X_train, y_train) [GridSearchQ]: Optimization complete. Best score: 0.0060630241297973545, with parameters: {'learner__n_estimators': 500, 'learner__criterion': 'entropy', 'measure': 'topsoe'}. >>> predictions = gs.predict(X_test) >>> predictions {0: 0.4182508973311534, 1: 0.5817491026688466}
- best_model()[source]#
Return the best model after fitting.
- Returns:
- Quantifier
The best fitted model.
- Raises:
- ValueError
If called before fitting.
- property classes_[source]#
Get the classes of the best model.
- Returns:
- array-like
The classes learned by the best model.
- fit(X, y)[source]#
Fit the quantifier model and perform grid search.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Training features, where
n_samples
is the number of samples andn_features
is the number of features.- yarray-like of shape (n_samples,)
Training labels.
- Returns:
- selfGridSearchQ
Returns the fitted instance of GridSearchQ.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get the parameters of the best model.
- Parameters:
- deepbool, optional, default=True
If True, will return the parameters for this estimator and contained subobjects.
- Returns:
- dict
Parameters of the best model.
- Raises:
- ValueError
If called before the model has been fitted.
- predict(X)[source]#
Make predictions using the best found model.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Data to predict on.
- Returns:
- array-like
Predictions for the input data.
- Raises:
- RuntimeError
If the model has not been fitted yet.