linearGAM with sklearn gridsearchCV
See original GitHub issueHi
I tried implementing LinearGAM with sklearn’s GridsearchCV and got an error when gridsearchCV tried to clone the estimator. The code is below:
def gam(x, y):
lams = np.random.rand(10, x.shape[1])
lams = np.exp(lams)
linear_gam = LinearGAM(n_splines=10, max_iter=1000)
parameters = {
'lam': [x for x in lams]
}
gam_cv = GridSearchCV(linear_gam, parameters, cv=5, iid=False, return_train_score=True,
refit=True, scoring='neg_mean_squared_error')
gam_cv.fit(x, y)
cv_results_df = pd.DataFrame(gam_cv.cv_results_).sort_values(by='mean_test_score', ascending=False)
return gam_cv, cv_results_df
gam_rank, gam_cv_results = gam(x_all, y_all)
I get the error
RuntimeError Traceback (most recent call last) <ipython-input-63-e5a0cd9be09f> in <module> ----> 1 gam_rank, gam_cv_results = gam(x_all, y_all)
<ipython-input-62-8afb9b4dc830> in gam(x, y) 7 } 8 gam_cv = GridSearchCV(linear_gam, parameters, cv=5, iid=False, return_train_score=True, >refit=True, scoring=‘neg_mean_squared_error’) ----> 9 gam_cv.fit(x, y) 10 cv_results_df = pd.DataFrame(gam_cv.cv_results_).sort_values(by=‘mean_test_score’, >ascending=False) 11 return gam_cv, cv_results_df
C:\Anaconda3\lib\site-packages\sklearn\model_selection_search.py in fit(self, X, y, groups, **fit_params) 630 n_splits = cv.get_n_splits(X, y, groups) 631 –> 632 base_estimator = clone(self.estimator) 633 634 parallel = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
C:\Anaconda3\lib\site-packages\sklearn\base.py in clone(estimator, safe) 73 raise RuntimeError('Cannot clone object %s, as the constructor ’ 74 ‘either does not set or modifies parameter %s’ % —> 75 (estimator, name)) 76 return new_object 77
RuntimeError: Cannot clone object LinearGAM(callbacks=[‘deviance’, ‘diffs’], fit_intercept=True, max_iter=1000, n_splines=10, scale=None, terms=‘auto’, tol=0.0001, verbose=False), as the constructor either does not set or modifies parameter callbacks
The dataset I used was sklearn’s california housing dataset.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:8 (1 by maintainers)
Top GitHub Comments
Actually, having looked at this further this does not seem to have anything at all to do with
terms
it instead occurs because the default value of thecallbacks
keyword argument for every GAM estimator is alist
. This tends to be a big no no in Python as lists are mutable and so leads to confusing situations where it is possible to change the default value of a keyword argument.I believe this was the root of this bug. When I instead apply the changes is #267 GridSearchCV succeeds with no issues.
Could you please look at this PR and see it is an acceptable fix?
Getting what looks like the same problem when using
LinearGAM
with sklearn’sTransformedTargetRegressor
.Admittedly knowing absolutely nothing about the motivations for the changes you made, it seems like diverging from such an important package’s requirements would be a bad idea? I know that if I can’t get it to play nice with my company’s existing sklearn infrastructure I’m probably going to have to abandon it.
Is there an older version that keeps to sklearn’s requirements?