Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

get_params method in cross-validation splitters

See original GitHub issue

Describe the workflow you want to enable

I would like to be able to recreate the cross-validataion splitters from parameters.

Presently, the cross-validation splitters do not have get_params or set_params options. I want to make different steps in my process stateless, and to do that I need to be able to retrieve the parameters used to create an object (and pass them out), then recreate the object (obviously in an unfitted state, but that doesn’t matter for cv splitters) when the parameters are passed back in.

Describe your proposed solution

get_params and set_params are already implemented in BaseEstimator. There are two possible options:

Copy the implementation into BaseCrossValidator This is faster to implement, but it isn’t very DRY, and it isn’t very maintainable. This could be done as a quick patch, but doing so might make problems for someone else later.
Pull the common API stuff backward to some kind of BaseSKObject class This is more maintainable, cleaner, and overall a better solution, but it will take thought, architectural consideration, and intentionality. This is not a quick patch.

Describe alternatives you’ve considered, if relevant

I can (and probably will) store the values I used to create the object in the first place, and pass them out when I build my parameter string. I don’t think this will be completed right away (if it’s even desired by the community at large), and the show must go on.

But having get_params and set_params common across all sklearn objects feels like a necessary step toward a more unified codebase. And it would make my life a lot easier!

Additional context

No response

Issue Analytics

State:
Created 2 years ago
Comments:12 (5 by maintainers)

Top GitHub Comments

3reactions

NicolasHugcommented, Sep 2, 2021

For you use-case I would just implement a custom class that re-implements the get_params() logic https://github.com/scikit-learn/scikit-learn/blob/fac31e727947ad53f2ed107f58a10b56b165cee7/sklearn/base.py#L187, possibly in a simplified way; you probably don’t need everything in BaseEstimator.get_params() just for splitters. After all, all it does it return the parameters that were passed to init().

Alternatively, I am deeply ashamed but this might work in most cases:

In [1]: from sklearn.model_selection import StratifiedGroupKFold

In [2]: cv = StratifiedGroupKFold()

In [3]: eval(str(cv).replace(cv.__class__.__name__, 'dict'))
Out[3]: {'n_splits': 5, 'random_state': None, 'shuffle': False}

1reaction

glemaitrecommented, Sep 2, 2021

OK got it. We also have the scorers that are in the same situation (while this is true that you can pass some strings).

It would be nice to have thoughts of others @scikit-learn/core-devs

Top Results From Across the Web

sklearn.model_selection.GridSearchCV

GridSearchCV implements a “fit” and a “score” method. ... See Custom refit strategy of a grid search with cross-validation to see how to...

31_cross_validation

The basic approach (k-fold CV) splits the training data into k subsets. ... by the generator output by the split method of the...

cv - Python package - CatBoost

Perform cross-validation on the dataset. The dataset is split into N folds. N–1 folds are used for training, and one fold is used...

Cross Validation and Grid Search - Towards Data Science

Cross validation works by splitting our dataset into random groups, holding one group out as the test, and training the model on the...

ForecastingGridSearchCV — sktime documentation

Perform grid-search cross-validation to find optimal model parameters. ... Either the estimator must contain a “score” function, or a scoring function must ...