question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: "Cannot clone object ..." when cloning an estimator that copies parameters in either __init__ or get_params

See original GitHub issue

Describe the bug

Context: https://github.com/scikit-learn/scikit-learn/issues/15722#issuecomment-893942972

Calling clone() on a BaseEstimator that copies parameters results in a RuntimeError, even if the parameters are otherwise equal (estimator.param == clone(estimator).param returns True but estimator.param is clone(estimator.param) returns False).

Either the documentation has an issue in that this is an unspecified requirement for clone() to work (and BaseEstimator __init__() and get_params() documentation should say that parameters must always be the same object), or the equality check in clone() is too strict and should be loosened.

Steps/Code to Reproduce

from sklearn.base import BaseEstimator, clone

class TestEstimator(BaseEstimator):
    def __init__(self, my_dict):
        self.my_dict = my_dict.copy()

some_dict = {'foo': 'bar'}
estimator = TestEstimator(some_dict)

clone(estimator)      # raises RuntimeError: Cannot clone object TestEstimator(my_dict={'foo': 'bar'}), as the constructor either does not set or modifies parameter my_dict

Expected Results

Calling clone(estimator) results in a new TestEstimator where the following assertions are true:

from sklearn.base import BaseEstimator, clone

class TestEstimator(BaseEstimator):
    def __init__(self, my_dict):
        self.my_dict = my_dict.copy()

some_dict = {'foo': 'bar'}
estimator = TestEstimator(some_dict)

new_estimator = clone(estimator)

assert estimator is not new_estimator
assert estimator.some_dict == new_estimator.some_dict        #  this isn't strictly necessary, but if clone() is going to assert equality then this seems like the right kind of check
assert estimator.some_dict is not new_estimator.some_dict

# no RuntimeError or AssertionError should be raised after running this snippet

Actual Results

>>> clone(estimator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/wchill/miniconda/envs/test38/lib/python3.8/site-packages/sklearn/base.py", line 95, in clone
    raise RuntimeError(
RuntimeError: Cannot clone object TestEstimator(my_dict={'foo': 'bar'}), as the constructor either does not set or modifies parameter my_dict

Versions

>>> import sklearn; sklearn.show_versions()

System:
    python: 3.8.12 (default, Oct 12 2021, 06:23:56)  [Clang 10.0.0 ]
executable: /Users/wchill/miniconda/envs/test38/bin/python
   machine: macOS-10.16-x86_64-i386-64bit

Python dependencies:
          pip: 21.2.4
   setuptools: 58.0.4
      sklearn: 1.0.2
        numpy: 1.22.3
        scipy: 1.8.0
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.1.0
threadpoolctl: 3.1.0

Built with OpenMP: True

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
adrinjalalicommented, Apr 8, 2022

Oh I see. For that all meta-estimators should be cloning the given estimator in fit, not __init__, and store it under another attribute such as self.estimator_.

0reactions
wchillcommented, Apr 5, 2022

My main motivation is to be able to clone custom estimators and have them be independent from each other; i.e. modifying one estimator’s params should not affect the other estimators. It seems much better for the param copying to be handled by the estimator rather than by the consumer of the estimator, because if many estimators are used then one would have to write code to handle each different one.

Reading through the old issue, it seems the main reason why this was done was to fix ambiguous equality checks when comparing numpy arrays. For the case of numpy arrays, np.array_equal might be an option?

Alternatively, @thomasjpfan 's proposal in https://github.com/scikit-learn/scikit-learn/issues/21838 could also work.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cannot clone object: Scikit-Learn custom estimator - Stack ...
Unfourtunately, my code works well, but trying to implement GridSearch fails due to a failure in sanity check of the parameters of the...
Read more >
sklearn.base — scikit-optimize 0.8.1 documentation
Clone does a deep copy of the model in an estimator without actually copying attached data. It yields a new estimator with the...
Read more >
sklearn.base — gplearn 0.4.2 documentation - Read the Docs
Clone does a deep copy of the model in an estimator without actually copying attached data. It returns a new estimator with the...
Read more >
Cloning Considerations - Snowflake Documentation
CLONE statements for most objects do not copy grants on the source object to ... Cloned objects inherit any object parameters that were...
Read more >
Cannot clone object <keras.wrappers.scikit_learn ...
I had the same problem. It seems to be a bug in keras that occurs with nested arrays as parameters for the grid...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found