LightGbm - recommendations on hyperparameters tuning
See original GitHub issueHi guys, I followed all of your examples regarding tuning LightGbm, however, I was hoping that perhaps some of you could share or reference some best practices and answer my questions below:
-
how many trails should the experiment run? Is a 100 typically sufficient to find a set of parameters that are ‘good enough’?
-
is the default learning rate and a hundred fitting rounds ‘good’ for finding the best hyperparameters (assuming time is not really a huge constraint)? I’m wondering if it should run with a slightly larger learning rate to speed it up and perhaps with more boosting rounds.
-
should I limit the space search for some of these parameters to help
Optuna
focus on what most matters? -
is the
MedianPruner
the most appropriate in this case? How manyn_warmup_steps
to choose? -
https://github.com/optuna/optuna/blob/master/examples/lightgbm_tuner_simple.py - instead of running a study, I also came across this example of tuning an
LightGbm
model. Is there some sort of ongoing hyperparameters optimization going on “on the fly”? I’m not quite sure howbest_params
get updated?
I would really appreciate advice of some more seasoned Optuna
users!
My current implementation looks like this. Ignore the task specific parameters, such as: ‘objective’:
def objective(trial):
dtrain = lgb.Dataset(train_x, label = train_y, categorical_feature = feat_cat, free_raw_data = False)
dtest = lgb.Dataset(test_x, label = test_y, categorical_feature = feat_cat, free_raw_data = False)
param = {
'objective': 'poisson',
'metric': 'rmse',
'verbosity': -1,
'boosting_type': 'gbdt',
'force_row_wise': True,
'max_depth': -1,
'max_bin': trial.suggest_int('max_bin', 1, 512),
'num_leaves': trial.suggest_int('num_leaves', 2, 512),
'lambda_l1': trial.suggest_loguniform('lambda_l1', 1e-8, 10.0),
'lambda_l2': trial.suggest_loguniform('lambda_l2', 1e-8, 10.0),
'feature_fraction': trial.suggest_uniform('feature_fraction', 0.4, 1.0),
'bagging_fraction': trial.suggest_uniform('bagging_fraction', 0.4, 1.0),
'bagging_freq': trial.suggest_int('bagging_freq', 1, 7),
'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 1, 50),
'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
'sub_feature': trial.suggest_uniform('sub_feature', 0.0, 1.0),
'sub_row': trial.suggest_uniform('sub_row', 0.0, 1.0)
}
# Add a callback for pruning
pruning_callback = optuna.integration.LightGBMPruningCallback(trial, 'rmse')
gbm = lgb.train(
param,
dtrain,
verbose_eval = 20,
valid_sets = [dtest],
callbacks = [pruning_callback],
categorical_feature = feat_cat
)
preds = gbm.predict(test_x)
accuracy = sqrt(sklearn.metrics.mean_squared_error(test_y, preds))
return accuracy
if __name__ == "__main__":
study = optuna.create_study(direction = 'minimize', pruner = optuna.pruners.MedianPruner(n_warmup_steps = 10))
study.optimize(objective, n_trials = 100)
print("Number of finished trials: {}".format(len(study.trials)))
print("Best trial:")
trial = study.best_trial
print(" Value: {}".format(trial.value))
print(" Params: ")
for key, value in trial.params.items():
print(" {}: {}".format(key, value))
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Thanks @hvy, that’s definitely an awesome reference I wasn’t aware of!
Have you read this blog post? It might be helpful addressing some of your points https://medium.com/optuna/lightgbm-tuner-new-optuna-integration-for-hyperparameter-optimization-8b7095e99258.