question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GridSearchCV fails with XGBoost models

See original GitHub issue

The following code fails:

import numpy as np
from dask_ml.model_selection import GridSearchCV, KFold
from dask_xgboost import XGBClassifier

x = np.random.randn(100, 2)
y = np.random.randint(0, 1, 100)

params = {'max_depth': [2, 3]}

clf = GridSearchCV(XGBClassifier(), params, cv=KFold(n_splits=2), scoring='neg_mean_squared_error')
clf.fit(x, y)

Stack trace:

  File "/home/jin/anaconda3/envs/ml-gpu/lib/python3.6/site-packages/dask_ml/model_selection/_normalize.py", line 38, in normalize_estimator
    val = getattr(est, attr)
  File "/home/jin/anaconda3/envs/ml-gpu/lib/python3.6/site-packages/xgboost/sklearn.py", line 536, in feature_importances_
    b = self.get_booster()
  File "/home/jin/anaconda3/envs/ml-gpu/lib/python3.6/site-packages/xgboost/sklearn.py", line 200, in get_booster
    raise XGBoostError('need to call fit or load_model beforehand')
xgboost.core.XGBoostError: need to call fit or load_model beforehand

The error arises because the normalize_estimator function is trying to get the ‘feature_importances_’ attribute of the xgb model, but the attribute doesn’t exist unless the model has already been trained. The normalize_estimator function I believe should gloss over this error, but it doesn’t because it’s not catching the XGBoostError error type.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:22 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
rrpelgrimcommented, May 17, 2021

I’m running into the same issue today. Do I understand correctly that there’s no way to implement dask_xgboost.XGBoostClassifier with dask_ml.model_selection.GridSearchCV ?

1reaction
TomAugspurgercommented, Jun 9, 2020

dask_xgboost.XGBoostClassifier doesn’t implement out-of-core training. It loads the data into distributed memory, and hands that off to xgboost’s distributed runtime.

My understanding is that the native xgboost.dask.XGBoostClassifier does the same thing.

On Mon, Jun 8, 2020 at 9:55 PM Gideon Blinick notifications@github.com wrote:

Hi @trivialfis https://github.com/trivialfis,

The reason we’d want to use dask’s XGBClassifier is to avoid loading all data into memory at once.

Dask XGBoost seems to have a memory leakage issue though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/667#issuecomment-640997713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITK6VKNEEPW5D2YYRLRVWQD3ANCNFSM4NBYBPNA .

Read more comments on GitHub >

github_iconTop Results From Across the Web

GridSearchCV on XGBoost model gives error - Stack Overflow
The error indicates that the shared memory is running out, It's likely that increasing the number of kfolds and/or adjusting the number of ......
Read more >
cross validation - Why GridSearchCV returns nan?
By default, GridSearchCV provides a score of nan when fitting the model fails. You can change that behavior and raise an error by...
Read more >
GridSearchCV with feature in xgboost - Kaggle
#if you have problem pip install, go https://github.com/dmlc/xgboost and raise an issue. import numpy as np import pandas as pd import xgboost as...
Read more >
dask_ml.model_selection.GridSearchCV - Dask-ML
If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error....
Read more >
sklearn.model_selection.GridSearchCV
Examples using sklearn.model_selection.GridSearchCV: Release Highlights for scikit-learn 0.24 Release Highlights for scikit-learn 0.24 Feature agglomeration ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found