GridSearchCV fails with XGBoost models
See original GitHub issueThe following code fails:
import numpy as np
from dask_ml.model_selection import GridSearchCV, KFold
from dask_xgboost import XGBClassifier
x = np.random.randn(100, 2)
y = np.random.randint(0, 1, 100)
params = {'max_depth': [2, 3]}
clf = GridSearchCV(XGBClassifier(), params, cv=KFold(n_splits=2), scoring='neg_mean_squared_error')
clf.fit(x, y)
Stack trace:
File "/home/jin/anaconda3/envs/ml-gpu/lib/python3.6/site-packages/dask_ml/model_selection/_normalize.py", line 38, in normalize_estimator
val = getattr(est, attr)
File "/home/jin/anaconda3/envs/ml-gpu/lib/python3.6/site-packages/xgboost/sklearn.py", line 536, in feature_importances_
b = self.get_booster()
File "/home/jin/anaconda3/envs/ml-gpu/lib/python3.6/site-packages/xgboost/sklearn.py", line 200, in get_booster
raise XGBoostError('need to call fit or load_model beforehand')
xgboost.core.XGBoostError: need to call fit or load_model beforehand
The error arises because the normalize_estimator
function is trying to get the ‘feature_importances_’ attribute of the xgb model, but the attribute doesn’t exist unless the model has already been trained. The normalize_estimator
function I believe should gloss over this error, but it doesn’t because it’s not catching the XGBoostError
error type.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:22 (7 by maintainers)
Top Results From Across the Web
GridSearchCV on XGBoost model gives error - Stack Overflow
The error indicates that the shared memory is running out, It's likely that increasing the number of kfolds and/or adjusting the number of ......
Read more >cross validation - Why GridSearchCV returns nan?
By default, GridSearchCV provides a score of nan when fitting the model fails. You can change that behavior and raise an error by...
Read more >GridSearchCV with feature in xgboost - Kaggle
#if you have problem pip install, go https://github.com/dmlc/xgboost and raise an issue. import numpy as np import pandas as pd import xgboost as...
Read more >dask_ml.model_selection.GridSearchCV - Dask-ML
If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error....
Read more >sklearn.model_selection.GridSearchCV
Examples using sklearn.model_selection.GridSearchCV: Release Highlights for scikit-learn 0.24 Release Highlights for scikit-learn 0.24 Feature agglomeration ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m running into the same issue today. Do I understand correctly that there’s no way to implement
dask_xgboost.XGBoostClassifier
withdask_ml.model_selection.GridSearchCV
?dask_xgboost.XGBoostClassifier
doesn’t implement out-of-core training. It loads the data into distributed memory, and hands that off to xgboost’s distributed runtime.My understanding is that the native
xgboost.dask.XGBoostClassifier
does the same thing.On Mon, Jun 8, 2020 at 9:55 PM Gideon Blinick notifications@github.com wrote: