Problems with the parameter `learning_rate` in HistGradientBoostingClassifier
See original GitHub issueDescribe the bug
Setting the argument learning_rate to a value larger than 0.1 in HistGradientBoostingClassifier encounters large performance degradation.
Steps/Code to Reproduce
import time
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_svmlight_file
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier
if __name__ == '__main__':
n_estimators = 100
learning_rate = 0.1
seed = 0
n_jobs =6
train = load_svmlight_file('../../../Dataset/libsvm/letter_training')
test = load_svmlight_file('../../../Dataset/libsvm/letter_testing')
X_train, y_train = np.asanyarray(train[0].toarray(), order='F'), train[1]-1
X_test, y_test = np.asanyarray(test[0].toarray(), order='C'), test[1]-1
""" XGBoost (Ver==1.1.1) """
model = XGBClassifier(n_estimators=n_estimators,
learning_rate=learning_rate,
objective='multi:softmax',
random_state=seed,
n_jobs=n_jobs)
tic = time.time()
model.fit(X_train, y_train)
toc = time.time()
training_time = toc - tic
tic = time.time()
y_pred = model.predict(X_test)
toc = time.time()
evaluating_time = toc - tic
acc = accuracy_score(y_test, y_pred)
print('XGBoost Testing Acc: {:.4f}%'.format(100.*acc))
print('XGBoost Training Time: {:.4f} s'.format(training_time))
print('XGBoost Evaluating Time: {:.4f} s\n'.format(evaluating_time))
""" LightGBM (Ver==2.3.1) """
model = LGBMClassifier(n_estimators=n_estimators,
learning_rate=learning_rate,
objective='multiclass',
random_state=seed,
n_jobs=n_jobs)
tic = time.time()
model.fit(X_train, y_train)
toc = time.time()
training_time = toc - tic
tic = time.time()
y_pred = model.predict(X_test)
toc = time.time()
evaluating_time = toc - tic
acc = accuracy_score(y_test, y_pred)
print('LightGBM Testing Acc: {:.4f}%'.format(100.*acc))
print('LightGBM Training Time: {:.4f} s'.format(training_time))
print('LightGBM Evaluating Time: {:.4f} s\n'.format(evaluating_time))
""" Sklearn-GBDT (Ver==0.22.1) """
model = HistGradientBoostingClassifier(max_iter=n_estimators,
learning_rate=learning_rate,
validation_fraction=None,
random_state=seed)
tic = time.time()
model.fit(X_train, y_train)
toc = time.time()
training_time = toc - tic
tic = time.time()
y_pred = model.predict(X_test)
toc = time.time()
evaluating_time = toc - tic
acc = accuracy_score(y_test, y_pred)
print('Sklearn Testing Acc: {:.4f}%'.format(100.*acc))
print('Sklearn Training Time: {:.4f} s'.format(training_time))
print('Sklearn Evaluating Time: {:.4f} s'.format(evaluating_time))
Expected Results
I expect the performance of HistGradientBoostingClassifier with learning_rate=0.3 to be slightly different from the case with learning_rate=0.1, either better or worse, instead of a huge degradation.
Actual Results
On the letter dataset publicly available in LIBSVM dataset, HistGradientBoostingClassifier achieves a testing accuracy of 95.74% with learning_rate=0.1, yet the accuracy is 6.16% and 6.06% with learning_rate=0.3 and 0.5, separately. Similar situations on other datasets like USPS.
Versions
sklearn: 0.22.1 numpy: 1.18.1 scipy: 1.4.1 Cython: 0.29.15 pandas: 1.0.1 matplotlib: 3.1.3 joblib: 0.14.1
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
sklearn.ensemble.HistGradientBoostingClassifier
For multiclass classification problems, 'log_loss' is also known as multinomial deviance or categorical crossentropy. Internally, the model fits one tree per ...
Read more >Hyperparameter tuning by grid-search — Scikit-learn course
Here we will use a tree-based model as a classifier (i.e. HistGradientBoostingClassifier ). That means: Numerical variables don't need scaling;. Categorical ...
Read more >Gradient Boosting | Hyperparameter Tuning Python
Learn parameter tuning in gradient boosting algorithm using Python ... This technique is followed for a classification problem while a ...
Read more >Tune Learning Rate for Gradient Boosting with XGBoost in ...
A problem with gradient boosted decision trees is that they are quick to learn and overfit training data. One effective way to slow...
Read more >Deep Dive into scikit-learn's HistGradientBoosting Classifier ...
... we will explore scikit-learn's implementation of histogram-based GBDT called HistGradientBoostingClassifier /Regressor and how it ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I also observe a huge performance degradation on LightGBM and XGBoost after using
get_equivalent_model
to pass parameters. If this is the expected behavior, this issue can be closed 😃. Thanks.I cannot reproduce your results @AaronX121 : much like sklearn, lightgbm gets very degraded performance when using a learning rate higher than 0.1 (I haven’t tried XGBoost) and when setting comparable hyperparameters with
get_equivalent_model
I suspect that the discrepancy you have comes from different ways of handling early stopping though I haven’t looked in details.
Also, note that 0.1 seems like the upper limit for the LR: setting it to 0.001 gets you decent results.