Question about xgboost CV and n_estimators / num_boost_round
See original GitHub issueHello
I’m enjoying exploring Optuna.
I have a question on getting cross validated results with xgboost and CV with Optuna. Asking here as I couldn’t find any examples in the repo on this.
I’ve studied the example you’ve posted here: https://github.com/optuna/optuna/blob/master/examples/xgboost_simple.py.
My attempt to use xgb.CV with Optuna,
def objective(trial):
dtrain = xgb.DMatrix(df[features], label=df.target
feature_names=features)
param = {
'silent': 1,
'objective': 'binary:logistic',
'eval_metric': "auc",
'booster': trial.suggest_categorical('booster', ['gbtree']),
'alpha': trial.suggest_loguniform('alpha', 1e-3, 1.0)
}
if param['booster'] == 'gbtree':
param['max_depth'] = trial.suggest_int('max_depth', 1, 9)
param['scale_pos_weight'] = trial.suggest_int('scale_pos_weight', 3, 75)
param['min_child_weight'] = trial.suggest_int('min_child_weight', 1, 9)
param['eta'] = trial.suggest_loguniform('eta', 1e-3, 1.0)
param['gamma'] = trial.suggest_loguniform('gamma', 1e-3, 1.0)
param['subsample'] = trial.suggest_loguniform('subsample', 0.6, 1.0)
param['colsample_bytree'] = trial.suggest_loguniform('colsample_bytree', 0.6, 1.0)
param['grow_policy'] = trial.suggest_categorical('grow_policy', ['depthwise', 'lossguide'])
xgb_cv_results = xgb.cv(params=param, dtrain=dtrain,num_boost_round=10000,
nfold=3, stratified=True, early_stopping_rounds=100,
seed=108, verbose_eval=False)
# Extract the best score
best_score = np.mean(xgb_cv_results['test-auc-mean'])
return best_score
sampler = TPESampler(seed=108)
optuna_hpt = optuna.create_study(sampler=sampler,
direction='maximize',
study_name='optuna_hpt')
optuna_hpt.optimize(objective, n_trials=150)
While this will give me the CV metric (auc here) and the best params e.g. below,
{'booster': 'gbtree',
'alpha': 0.054159958811690126,
'max_depth': 7,
'scale_pos_weight': 16,
'min_child_weight': 9,
'eta': 0.0026002759893806117,
'gamma': 0.0011140626171961645,
'subsample': 0.667891200106278,
'colsample_bytree': 0.6224726913934507,
'grow_policy': 'lossguide'}
but this doesn’t tell me how many num_boost_rounds / n_estimators I need to train still since I use a large number and early stopping.
Am I right in assuming that I will need to save the cross validation (xgb.cv) results and get the n_estimators from there ? and my final model will possibly be a retrained model with the best_parameters and num_boost_round included?
what am I missing
Appreciate any help .
thanks!
Note to the questioner
If you are more comfortable with Stack Overflow, you may consider posting your question there instead. Alternatively, for issues that would benefit from more of an interactive session with the developers, you may refer to the optuna/optuna chat on Gitter.
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (5 by maintainers)
Top GitHub Comments
Hello
Really appreciate the quick response. Thank you. 😃 You are right about the n_estimators. num_boost_round and n_estimators are aliases. Though in optuna I could only use n_estimators in trial.set_user_attr() and not num_boost_round (got an error message)
Based on your suggestion, I’ve modified the code now to,
I’d be happy to add it as an example under https://github.com/optuna/optuna/tree/master/examples Or feel free to include it if you see fit.
I think this would be a great example to add. Calculating n_estimators for a final model after early stopping is a very common task.