Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GridSearchCV fails with XGBoost models

See original GitHub issue

The following code fails:

import numpy as np
from dask_ml.model_selection import GridSearchCV, KFold
from dask_xgboost import XGBClassifier

x = np.random.randn(100, 2)
y = np.random.randint(0, 1, 100)

params = {'max_depth': [2, 3]}

clf = GridSearchCV(XGBClassifier(), params, cv=KFold(n_splits=2), scoring='neg_mean_squared_error')
clf.fit(x, y)

Stack trace:

  File "/home/jin/anaconda3/envs/ml-gpu/lib/python3.6/site-packages/dask_ml/model_selection/_normalize.py", line 38, in normalize_estimator
    val = getattr(est, attr)
  File "/home/jin/anaconda3/envs/ml-gpu/lib/python3.6/site-packages/xgboost/sklearn.py", line 536, in feature_importances_
    b = self.get_booster()
  File "/home/jin/anaconda3/envs/ml-gpu/lib/python3.6/site-packages/xgboost/sklearn.py", line 200, in get_booster
    raise XGBoostError('need to call fit or load_model beforehand')
xgboost.core.XGBoostError: need to call fit or load_model beforehand

The error arises because the normalize_estimator function is trying to get the ‘feature_importances_’ attribute of the xgb model, but the attribute doesn’t exist unless the model has already been trained. The normalize_estimator function I believe should gloss over this error, but it doesn’t because it’s not catching the XGBoostError error type.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:22 (7 by maintainers)

Top GitHub Comments

2reactions

rrpelgrimcommented, May 17, 2021

I’m running into the same issue today. Do I understand correctly that there’s no way to implement dask_xgboost.XGBoostClassifier with dask_ml.model_selection.GridSearchCV ?

1reaction

TomAugspurgercommented, Jun 9, 2020

dask_xgboost.XGBoostClassifier doesn’t implement out-of-core training. It loads the data into distributed memory, and hands that off to xgboost’s distributed runtime.

My understanding is that the native xgboost.dask.XGBoostClassifier does the same thing.

On Mon, Jun 8, 2020 at 9:55 PM Gideon Blinick notifications@github.com wrote:

Hi @trivialfis https://github.com/trivialfis,

The reason we’d want to use dask’s XGBClassifier is to avoid loading all data into memory at once.

Dask XGBoost seems to have a memory leakage issue though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/667#issuecomment-640997713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITK6VKNEEPW5D2YYRLRVWQD3ANCNFSM4NBYBPNA .

Top Results From Across the Web

GridSearchCV on XGBoost model gives error - Stack Overflow

The error indicates that the shared memory is running out, It's likely that increasing the number of kfolds and/or adjusting the number of ......

cross validation - Why GridSearchCV returns nan?

By default, GridSearchCV provides a score of nan when fitting the model fails. You can change that behavior and raise an error by...

GridSearchCV with feature in xgboost - Kaggle

#if you have problem pip install, go https://github.com/dmlc/xgboost and raise an issue. import numpy as np import pandas as pd import xgboost as...

dask_ml.model_selection.GridSearchCV - Dask-ML

If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error....

sklearn.model_selection.GridSearchCV

Examples using sklearn.model_selection.GridSearchCV: Release Highlights for scikit-learn 0.24 Release Highlights for scikit-learn 0.24 Feature agglomeration ...