question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GradientBoosting fails when using `init` estimator parameter.

See original GitHub issue

Description

Passing any estimator to the init parameter in GradientBoostingClassifier (or GradientBoostingRegressor) causes errors when fitting. I’m including one example below as a MWE but I’ve fiddled with this issue for a while and get a number of different errors even if I can get past one. My guess is that this part of the code (the init estimator part) hasn’t been shown any love in quite a while (since I’ve found similar issues that date back to 2014: https://github.com/scikit-learn/scikit-learn/issues/2691). A more complete exploration can be found here: https://github.com/stoddardg/sklearn_bug_example/blob/master/Gradient%2BBoosting%2Binit%2BMWE.ipynb

Steps/Code to Reproduce


from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegressionCV
from sklearn.ensemble import GradientBoostingClassifier

X, y = make_classification(n_samples=1000, n_features=50, n_informative=10,random_state=0)
init_clf = LogisticRegressionCV()
clf = GradientBoostingClassifier(init=init_clf)
clf.fit(X, y)

Actual Results

IndexError                                Traceback (most recent call last)
<ipython-input-5-f94c05274ee0> in <module>()
      6 init_clf = LogisticRegressionCV()
      7 clf = GradientBoostingClassifier(init=init_clf)
----> 8 clf.fit(train_X, train_y)

~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in fit(self, X, y, sample_weight, monitor)
   1463         n_stages = self._fit_stages(X, y, y_pred, sample_weight, self._rng,
   1464                                     X_val, y_val, sample_weight_val,
-> 1465                                     begin_at_stage, monitor, X_idx_sorted)
   1466 
   1467         # change shape of arrays after fit (early-stopping or additional ests)

~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in _fit_stages(self, X, y, y_pred, sample_weight, random_state, X_val, y_val, sample_weight_val, begin_at_stage, monitor, X_idx_sorted)
   1527             y_pred = self._fit_stage(i, X, y, y_pred, sample_weight,
   1528                                      sample_mask, random_state, X_idx_sorted,
-> 1529                                      X_csc, X_csr)
   1530 
   1531             # track deviance (= loss)

~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in _fit_stage(self, i, X, y, y_pred, sample_weight, sample_mask, random_state, X_idx_sorted, X_csc, X_csr)
   1197             loss.update_terminal_regions(tree.tree_, X, y, residual, y_pred,
   1198                                          sample_weight, sample_mask,
-> 1199                                          self.learning_rate, k=k)
   1200 
   1201             # add tree to ensemble

~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in update_terminal_regions(self, tree, X, y, residual, y_pred, sample_weight, sample_mask, learning_rate, k)
    396             self._update_terminal_region(tree, masked_terminal_regions,
    397                                          leaf, X, y, residual,
--> 398                                          y_pred[:, k], sample_weight)
    399 
    400         # update predictions (both in-bag and out-of-bag)

IndexError: too many indices for array

Versions

System
------
    python: 3.7.0 (default, Jun 28 2018, 13:15:42)  [GCC 7.2.0]
executable: /home/gstoddard/anaconda3/envs/eis_env/bin/python
   machine: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-redhat-7.5-Maipo

BLAS
----
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /home/gstoddard/anaconda3/envs/eis_env/lib
cblas_libs: mkl_rt, pthread

Python deps
-----------
       pip: 18.0
setuptools: 39.2.0
   sklearn: 0.20.0
     numpy: 1.15.0
     scipy: 1.1.0
    Cython: None
    pandas: 0.23.3

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
stoddardgcommented, Oct 30, 2018

Sorry for the late reply but I haven’t noticed any other bugs. The workaround we identified has been working for me and I’m excited to see the finished product.

Also, I just wanted to say that I really appreciate the work that you (and other folks) have done on this. I’ve been keeping an eye on the PR and resulting discussion and its been educating/fun to watch. I’ll try to pay it forward in the future by contributing myself (or at least encouraging my team to do so) but in the mean time, I donated a small amount. Thanks!

0reactions
jeremiedbbcommented, Mar 2, 2020

Fixed by #12983. Closing

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.ensemble.GradientBoostingRegressor
Gradient Boosting for regression. This estimator builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary ...
Read more >
In Depth: Parameter tuning for Gradient Boosting
Let's first fit a gradient boosting classifier with default parameters to get a baseline idea of the performance
Read more >
How to Configure the Gradient Boosting Algorithm
He suggests to first set a large value for the number of trees, then tune the shrinkage parameter to achieve the best results....
Read more >
GradientBoostingClassifier with a BaseEstimator in scikit- ...
How can i make GradientBoostingRegressor work with a BaseEstimator in scikit-learn? 2 · How to use GradientBoostingClassifier init Parameters in sklearn ...
Read more >
Gradient Boosting | Hyperparameter Tuning Python
Learn parameter tuning in gradient boosting algorithm using Python ... GBM works by starting with an initial estimate which is updated using ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found