RuntimeError: "Cannot clone object ..." when cloning an estimator that copies parameters in either __init__ or get_params
See original GitHub issueDescribe the bug
Context: https://github.com/scikit-learn/scikit-learn/issues/15722#issuecomment-893942972
Calling clone()
on a BaseEstimator that copies parameters results in a RuntimeError, even if the parameters are otherwise equal (estimator.param == clone(estimator).param
returns True but estimator.param is clone(estimator.param)
returns False).
Either the documentation has an issue in that this is an unspecified requirement for clone()
to work (and BaseEstimator __init__()
and get_params()
documentation should say that parameters must always be the same object), or the equality check in clone()
is too strict and should be loosened.
Steps/Code to Reproduce
from sklearn.base import BaseEstimator, clone
class TestEstimator(BaseEstimator):
def __init__(self, my_dict):
self.my_dict = my_dict.copy()
some_dict = {'foo': 'bar'}
estimator = TestEstimator(some_dict)
clone(estimator) # raises RuntimeError: Cannot clone object TestEstimator(my_dict={'foo': 'bar'}), as the constructor either does not set or modifies parameter my_dict
Expected Results
Calling clone(estimator)
results in a new TestEstimator where the following assertions are true:
from sklearn.base import BaseEstimator, clone
class TestEstimator(BaseEstimator):
def __init__(self, my_dict):
self.my_dict = my_dict.copy()
some_dict = {'foo': 'bar'}
estimator = TestEstimator(some_dict)
new_estimator = clone(estimator)
assert estimator is not new_estimator
assert estimator.some_dict == new_estimator.some_dict # this isn't strictly necessary, but if clone() is going to assert equality then this seems like the right kind of check
assert estimator.some_dict is not new_estimator.some_dict
# no RuntimeError or AssertionError should be raised after running this snippet
Actual Results
>>> clone(estimator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/wchill/miniconda/envs/test38/lib/python3.8/site-packages/sklearn/base.py", line 95, in clone
raise RuntimeError(
RuntimeError: Cannot clone object TestEstimator(my_dict={'foo': 'bar'}), as the constructor either does not set or modifies parameter my_dict
Versions
>>> import sklearn; sklearn.show_versions()
System:
python: 3.8.12 (default, Oct 12 2021, 06:23:56) [Clang 10.0.0 ]
executable: /Users/wchill/miniconda/envs/test38/bin/python
machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
pip: 21.2.4
setuptools: 58.0.4
sklearn: 1.0.2
numpy: 1.22.3
scipy: 1.8.0
Cython: None
pandas: None
matplotlib: None
joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Cannot clone object: Scikit-Learn custom estimator - Stack ...
Unfourtunately, my code works well, but trying to implement GridSearch fails due to a failure in sanity check of the parameters of the...
Read more >sklearn.base — scikit-optimize 0.8.1 documentation
Clone does a deep copy of the model in an estimator without actually copying attached data. It yields a new estimator with the...
Read more >sklearn.base — gplearn 0.4.2 documentation - Read the Docs
Clone does a deep copy of the model in an estimator without actually copying attached data. It returns a new estimator with the...
Read more >Cloning Considerations - Snowflake Documentation
CLONE statements for most objects do not copy grants on the source object to ... Cloned objects inherit any object parameters that were...
Read more >Cannot clone object <keras.wrappers.scikit_learn ...
I had the same problem. It seems to be a bug in keras that occurs with nested arrays as parameters for the grid...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Oh I see. For that all meta-estimators should be cloning the given estimator in
fit
, not__init__
, and store it under another attribute such asself.estimator_
.My main motivation is to be able to clone custom estimators and have them be independent from each other; i.e. modifying one estimator’s params should not affect the other estimators. It seems much better for the param copying to be handled by the estimator rather than by the consumer of the estimator, because if many estimators are used then one would have to write code to handle each different one.
Reading through the old issue, it seems the main reason why this was done was to fix ambiguous equality checks when comparing numpy arrays. For the case of numpy arrays, np.array_equal might be an option?
Alternatively, @thomasjpfan 's proposal in https://github.com/scikit-learn/scikit-learn/issues/21838 could also work.