LightGBMTunerCV seems to not handle user specified CV-folds
See original GitHub issueExpected behavior
When users specify training and validation folds in the manner that the basic lightgbm.cv function accepts, this should (from what I understand work)
Environment
- Optuna version: 2.0.0
- Python version: 3.6.9
- OS: Google Collab/Linux
- (Optional) Other libraries and their versions: LightGBM 2.3.1
Error messages, stack traces, or logs
0%| | 0/7 [00:00<?, ?it/s]
feature_fraction, val_score: inf: 0%| | 0/7 [00:00<?, ?it/s][W 2020-08-24 15:41:09,973] Trial 0 failed because of the following error: ValueError('For early stopping, at least one dataset and eval metric is required for evaluation',)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/optuna/study.py", line 709, in _run_trial
result = func(trial)
File "/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py", line 302, in __call__
cv_results = lgb.cv(self.lgbm_params, self.train_set, **self.lgbm_kwargs)
File "/usr/local/lib/python3.6/dist-packages/lightgbm/engine.py", line 576, in cv
evaluation_result_list=res))
File "/usr/local/lib/python3.6/dist-packages/lightgbm/callback.py", line 221, in _callback
_init(env)
File "/usr/local/lib/python3.6/dist-packages/lightgbm/callback.py", line 191, in _init
raise ValueError('For early stopping, '
ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-18-0ec8edbe946a> in <module>()
2 label = np.array( data['target'] ).flatten())
3 tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, early_stopping_rounds=100, folds=folds)
----> 4 tuner.run()
10 frames
/usr/local/lib/python3.6/dist-packages/lightgbm/callback.py in _init(env)
189 return
190 if not env.evaluation_result_list:
--> 191 raise ValueError('For early stopping, '
192 'at least one dataset and eval metric is required for evaluation')
193
ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
As well as (on the second version without early stopping, which I think is an issue that’s already reported in another issue?):
0%| | 0/7 [00:00<?, ?it/s]
feature_fraction, val_score: inf: 0%| | 0/7 [00:00<?, ?it/s][W 2020-08-24 15:42:03,262] Trial 0 failed because of the following error: KeyError('l1-mean',)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/optuna/study.py", line 709, in _run_trial
result = func(trial)
File "/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py", line 304, in __call__
val_scores = self._get_cv_scores(cv_results)
File "/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py", line 294, in _get_cv_scores
val_scores = cv_results["{}-mean".format(metric)]
KeyError: 'l1-mean'
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-21-942a30076787> in <module>()
1 tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, folds=folds)
----> 2 tuner.run()
8 frames
/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py in _get_cv_scores(self, cv_results)
292
293 metric = self._get_metric_for_objective()
--> 294 val_scores = cv_results["{}-mean".format(metric)]
295 return val_scores
296
KeyError: 'l1-mean'
Steps to reproduce
- Get a Google Collab, then run the code below (extra installations beyond default collab explicitly via !pip below)
Reproducible examples (optional)
!pip install lightgbm==2.3.1
import numpy as np
import pandas as pd
from sklearn.model_selection import GroupKFold
import lightgbm as lgb
lgb.__version__
np.random.seed(123)
data = pd.DataFrame({'var1': np.random.normal(loc=0, scale=1, size=100),
'var2': np.random.normal(loc=0, scale=1, size=100),
'var3': np.random.normal(loc=0, scale=1, size=100),
'testfold': np.random.choice(a=np.repeat([x for x in range(5)], 20), size=100, replace=False)})
data['target'] = 7 + 0.1*data['var1'] + 1.0*data['var2'] + 5.0*data['var3'] - 2.0*data['var1']*data['var2'] + np.random.normal(loc=0, scale=0.5, size=100)
data.head()
params = {
'objective': 'l1',
'metric': 'l1',
"verbosity": -1,
"boosting_type": "gbdt",
'seed': 1979
}
dtrain = lgb.Dataset(data= np.array( data[ ['var1', 'var2', 'var3'] ] ),
label = np.array( data['target'] ).flatten())
folds = GroupKFold().split(np.array( data[ ['var1', 'var2', 'var3'] ] ),
np.array( data['target'] ).flatten(),
np.array(data['testfold']).flatten())
lgb.cv(params, dtrain, folds=folds, verbose_eval=100) # This is how base lightgbm does this, and it works fine
!pip install optuna
import optuna.integration.lightgbm as lgb
dtrain = lgb.Dataset(data= np.array( data[ ['var1', 'var2', 'var3'] ] ),
label = np.array( data['target'] ).flatten())
tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, early_stopping_rounds=100, folds=folds)
tuner.run()
tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, folds=folds)
tuner.run()
Additional context (optional)
Same issue in Kaggle kernels, but thought it would be easier to share a simplified Collab example.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
optuna.integration.lightgbm.LightGBMTunerCV - Read the Docs
LightGBMTunerCV invokes lightgbm.cv() to train and validate boosters while LightGBMTuner invokes lightgbm.train(). See a simple example which optimizes the ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thank you for your bug report. I’m not aware of the first issue. I’ll investigate it.
I think it is the same issue as #1602. The cause is the lack of the metric mapping in
LightGBMTunerCV
and @thigm85 is working on it.I think root cause of this issue is #2960. Because passing GroupKFold.split return value (it’s generator) as lgb.LightGBMTunerCV’s “folds” arg, on 2nd round train/search, generator doesn’t work well so cv_result will be empty dictionary. To fix, as mentioned in #2960,
list(GroupKfold.split(*some_args))
should be passed as “folds” arg.