Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LightGBMTunerCV seems to not handle user specified CV-folds

See original GitHub issue

Expected behavior

When users specify training and validation folds in the manner that the basic lightgbm.cv function accepts, this should (from what I understand work)

Environment

Optuna version: 2.0.0
Python version: 3.6.9
OS: Google Collab/Linux
(Optional) Other libraries and their versions: LightGBM 2.3.1

Error messages, stack traces, or logs

0%|          | 0/7 [00:00<?, ?it/s]
feature_fraction, val_score: inf:   0%|          | 0/7 [00:00<?, ?it/s][W 2020-08-24 15:41:09,973] Trial 0 failed because of the following error: ValueError('For early stopping, at least one dataset and eval metric is required for evaluation',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/optuna/study.py", line 709, in _run_trial
    result = func(trial)
  File "/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py", line 302, in __call__
    cv_results = lgb.cv(self.lgbm_params, self.train_set, **self.lgbm_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/lightgbm/engine.py", line 576, in cv
    evaluation_result_list=res))
  File "/usr/local/lib/python3.6/dist-packages/lightgbm/callback.py", line 221, in _callback
    _init(env)
  File "/usr/local/lib/python3.6/dist-packages/lightgbm/callback.py", line 191, in _init
    raise ValueError('For early stopping, '
ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-0ec8edbe946a> in <module>()
      2                      label = np.array( data['target'] ).flatten())
      3 tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, early_stopping_rounds=100, folds=folds)
----> 4 tuner.run()

10 frames
/usr/local/lib/python3.6/dist-packages/lightgbm/callback.py in _init(env)
    189             return
    190         if not env.evaluation_result_list:
--> 191             raise ValueError('For early stopping, '
    192                              'at least one dataset and eval metric is required for evaluation')
    193 

ValueError: For early stopping, at least one dataset and eval metric is required for evaluation

As well as (on the second version without early stopping, which I think is an issue that’s already reported in another issue?):

0%|          | 0/7 [00:00<?, ?it/s]


feature_fraction, val_score: inf:   0%|          | 0/7 [00:00<?, ?it/s][W 2020-08-24 15:42:03,262] Trial 0 failed because of the following error: KeyError('l1-mean',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/optuna/study.py", line 709, in _run_trial
    result = func(trial)
  File "/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py", line 304, in __call__
    val_scores = self._get_cv_scores(cv_results)
  File "/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py", line 294, in _get_cv_scores
    val_scores = cv_results["{}-mean".format(metric)]
KeyError: 'l1-mean'
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-21-942a30076787> in <module>()
      1 tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, folds=folds)
----> 2 tuner.run()

8 frames
/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py in _get_cv_scores(self, cv_results)
    292 
    293         metric = self._get_metric_for_objective()
--> 294         val_scores = cv_results["{}-mean".format(metric)]
    295         return val_scores
    296 

KeyError: 'l1-mean'

Steps to reproduce

Get a Google Collab, then run the code below (extra installations beyond default collab explicitly via !pip below)

Reproducible examples (optional)

!pip install lightgbm==2.3.1
import numpy as np
import pandas as pd
from sklearn.model_selection import GroupKFold

import lightgbm as lgb
lgb.__version__
np.random.seed(123)
data = pd.DataFrame({'var1': np.random.normal(loc=0, scale=1, size=100),
                    'var2': np.random.normal(loc=0, scale=1, size=100),
                    'var3': np.random.normal(loc=0, scale=1, size=100),
                     'testfold': np.random.choice(a=np.repeat([x for x in range(5)], 20), size=100, replace=False)})
data['target'] = 7 + 0.1*data['var1'] + 1.0*data['var2'] + 5.0*data['var3'] - 2.0*data['var1']*data['var2'] + np.random.normal(loc=0, scale=0.5, size=100)
data.head()
params = {
    'objective': 'l1',
    'metric': 'l1',    
    "verbosity": -1,
    "boosting_type": "gbdt",
    'seed': 1979
    }

dtrain = lgb.Dataset(data= np.array( data[ ['var1', 'var2', 'var3'] ] ),
                     label = np.array( data['target'] ).flatten())

folds = GroupKFold().split(np.array( data[ ['var1', 'var2', 'var3'] ] ),
                            np.array( data['target'] ).flatten(), 
                            np.array(data['testfold']).flatten())
lgb.cv(params, dtrain, folds=folds, verbose_eval=100) # This is how base lightgbm does this, and it works fine


!pip install optuna
import optuna.integration.lightgbm as lgb

dtrain = lgb.Dataset(data= np.array( data[ ['var1', 'var2', 'var3'] ] ),
                     label = np.array( data['target'] ).flatten())
tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, early_stopping_rounds=100, folds=folds)
tuner.run()

tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, folds=folds)
tuner.run()

Additional context (optional)

Same issue in Kaggle kernels, but thought it would be easier to share a simplified Collab example.

Issue Analytics

State:
Created 3 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

5reactions

toshihikoyanasecommented, Aug 25, 2020

Thank you for your bug report. I’m not aware of the first issue. I’ll investigate it.

As well as (on the second version without early stopping, which I think is an issue that’s already reported in another issue?):

I think it is the same issue as #1602. The cause is the lack of the metric mapping in LightGBMTunerCV and @thigm85 is working on it.

0reactions

shinberocommented, Oct 30, 2022

I think root cause of this issue is #2960. Because passing GroupKFold.split return value (it’s generator) as lgb.LightGBMTunerCV’s “folds” arg, on 2nd round train/search, generator doesn’t work well so cv_result will be empty dictionary. To fix, as mentioned in #2960, list(GroupKfold.split(*some_args)) should be passed as “folds” arg.

Top Results From Across the Web

optuna.integration.lightgbm.LightGBMTunerCV - Read the Docs

LightGBMTunerCV invokes lightgbm.cv() to train and validate boosters while LightGBMTuner invokes lightgbm.train(). See a simple example which optimizes the ...