GradientBoosting fails when using `init` estimator parameter.
See original GitHub issueDescription
Passing any estimator to the init
parameter in GradientBoostingClassifier
(or GradientBoostingRegressor
) causes errors when fitting. I’m including one example below as a MWE but I’ve fiddled with this issue for a while and get a number of different errors even if I can get past one. My guess is that this part of the code (the init
estimator part) hasn’t been shown any love in quite a while (since I’ve found similar issues that date back to 2014: https://github.com/scikit-learn/scikit-learn/issues/2691). A more complete exploration can be found here: https://github.com/stoddardg/sklearn_bug_example/blob/master/Gradient%2BBoosting%2Binit%2BMWE.ipynb
Steps/Code to Reproduce
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegressionCV
from sklearn.ensemble import GradientBoostingClassifier
X, y = make_classification(n_samples=1000, n_features=50, n_informative=10,random_state=0)
init_clf = LogisticRegressionCV()
clf = GradientBoostingClassifier(init=init_clf)
clf.fit(X, y)
Actual Results
IndexError Traceback (most recent call last)
<ipython-input-5-f94c05274ee0> in <module>()
6 init_clf = LogisticRegressionCV()
7 clf = GradientBoostingClassifier(init=init_clf)
----> 8 clf.fit(train_X, train_y)
~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in fit(self, X, y, sample_weight, monitor)
1463 n_stages = self._fit_stages(X, y, y_pred, sample_weight, self._rng,
1464 X_val, y_val, sample_weight_val,
-> 1465 begin_at_stage, monitor, X_idx_sorted)
1466
1467 # change shape of arrays after fit (early-stopping or additional ests)
~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in _fit_stages(self, X, y, y_pred, sample_weight, random_state, X_val, y_val, sample_weight_val, begin_at_stage, monitor, X_idx_sorted)
1527 y_pred = self._fit_stage(i, X, y, y_pred, sample_weight,
1528 sample_mask, random_state, X_idx_sorted,
-> 1529 X_csc, X_csr)
1530
1531 # track deviance (= loss)
~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in _fit_stage(self, i, X, y, y_pred, sample_weight, sample_mask, random_state, X_idx_sorted, X_csc, X_csr)
1197 loss.update_terminal_regions(tree.tree_, X, y, residual, y_pred,
1198 sample_weight, sample_mask,
-> 1199 self.learning_rate, k=k)
1200
1201 # add tree to ensemble
~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in update_terminal_regions(self, tree, X, y, residual, y_pred, sample_weight, sample_mask, learning_rate, k)
396 self._update_terminal_region(tree, masked_terminal_regions,
397 leaf, X, y, residual,
--> 398 y_pred[:, k], sample_weight)
399
400 # update predictions (both in-bag and out-of-bag)
IndexError: too many indices for array
Versions
System
------
python: 3.7.0 (default, Jun 28 2018, 13:15:42) [GCC 7.2.0]
executable: /home/gstoddard/anaconda3/envs/eis_env/bin/python
machine: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-redhat-7.5-Maipo
BLAS
----
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /home/gstoddard/anaconda3/envs/eis_env/lib
cblas_libs: mkl_rt, pthread
Python deps
-----------
pip: 18.0
setuptools: 39.2.0
sklearn: 0.20.0
numpy: 1.15.0
scipy: 1.1.0
Cython: None
pandas: 0.23.3
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (7 by maintainers)
Top Results From Across the Web
sklearn.ensemble.GradientBoostingRegressor
Gradient Boosting for regression. This estimator builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary ...
Read more >In Depth: Parameter tuning for Gradient Boosting
Let's first fit a gradient boosting classifier with default parameters to get a baseline idea of the performance
Read more >How to Configure the Gradient Boosting Algorithm
He suggests to first set a large value for the number of trees, then tune the shrinkage parameter to achieve the best results....
Read more >GradientBoostingClassifier with a BaseEstimator in scikit- ...
How can i make GradientBoostingRegressor work with a BaseEstimator in scikit-learn? 2 · How to use GradientBoostingClassifier init Parameters in sklearn ...
Read more >Gradient Boosting | Hyperparameter Tuning Python
Learn parameter tuning in gradient boosting algorithm using Python ... GBM works by starting with an initial estimate which is updated using ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sorry for the late reply but I haven’t noticed any other bugs. The workaround we identified has been working for me and I’m excited to see the finished product.
Also, I just wanted to say that I really appreciate the work that you (and other folks) have done on this. I’ve been keeping an eye on the PR and resulting discussion and its been educating/fun to watch. I’ll try to pay it forward in the future by contributing myself (or at least encouraging my team to do so) but in the mean time, I donated a small amount. Thanks!
Fixed by #12983. Closing