Change default copy value from True to None
See original GitHub issueA fair amount of estimators currently have copy=True (or copy_X=True) by default. In practice, this means that the code looks something like,
X = check_array(X, copy=copy)
and then some other calculations that may change or not X inplace. In the case when the following operations are not done inplace, we have just made a wasteful copy with no good reason.
As discussed in https://github.com/scikit-learn/scikit-learn/issues/13923, an example is for instance Ridge(fit_intercept=False) that will copy X, although it is not needed. Actually, I can’t find any inplace operations of (found it)X in Ridge even with fit_intercept=True, but maybe I am missing something.
I think in general it would be better to avoid the,
X = check_array(X, copy=copy)
pattern, and instead make a copy explicitly where it is needed. Maybe it could be OK to not make a copy with copy=True if no copy is needed. Alternatively we could introduce copy=None by default.
Adding a common test that checks that Estimator(copy=True).fit(X, y) doesn’t change X.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:14 (14 by maintainers)

Top Related StackOverflow Question
I think I had just assumed that copy=True meant copy=None. I’d prefer
copy='on-write'or something…Very good point. I made #13987 to address this issue in preprocessing (e.g.
StandardScaler).For future reference, to find estimators that potentially have this issue, one can use a common test that checks that an exception is raised when one tries to use an estimator with
copy=Falseon read-only array. If it is not raised, it is likely that withcopy=Truea copy is not actually necessary (though there are false positives).It’s not reliable enough to add it to common tests, but as a detection method, it works reasonably well. The same could be done for classifiers etc.
Also, it should be noted, that for more complex estimators with numerous options it is sometimes hard to decide whether a copy is needed (e.g.
Birch.fit). In that case, it’s probably better to keep the copy to be safe, particularly when the performance gained by avoiding the copy is negligible with respect to the fit or transform time.