question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using warmstart in GradientBoostingClassifier produces inferior solution

See original GitHub issue

Description

Using the warmstart flag for the GradientBoostingClassifier results in an inferior solution compared to fitting all base models at once. From how the docs regarding the warmstarting are written, I expect the results to be equal.

Steps/Code to Reproduce

import numpy as np
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection


X, y = sklearn.datasets.load_digits(return_X_y=True)
rs = np.random.RandomState(42)
indices = np.arange(X.shape[0])
rs.shuffle(indices)
indices = indices[:150]
X = X[indices]
y = y[indices]
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=16,
)
classifier = sklearn.ensemble.GradientBoostingClassifier(
    warm_start=True, n_estimators=1, random_state=1,
)
classifier.fit(X_train, y_train)
for i in range(99):
    classifier.n_estimators += 1
    classifier.fit(X_train, y_train)
print(classifier.score(X_test, y_test))

classifier = sklearn.ensemble.GradientBoostingClassifier(
    n_estimators=100, random_state=1,
)
classifier.fit(X_train, y_train)
print(classifier.score(X_test, y_test))

Expected Results

Two equal scores.

Actual Results

0.605263157895 0.657894736842

Versions

Linux-4.4.0-96-generic-x86_64-with-debian-stretch-sid Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] NumPy 1.13.3 SciPy 0.19.1 Scikit-Learn 0.19.1

Edit

The following models have the same issue:

  • ExtraTreesClassifier
  • SGDClassifier
  • PassiveAggressiveClassifier

The RandomForestClassifier seems to not have this problem and I did not test regression algorithms but expect them to behave the same.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
lestevecommented, Oct 26, 2017

🏆 Congrats for #10000 ! 🏆

Let’s just hope that all issues will be like #10000 from now on: a fully stand-alone snippet to reproduce the problem and closed under a day 😉 !

1reaction
lestevecommented, Oct 26, 2017

sorry if this was another fix I should have included in 0.19.1, but did not realise. This is the trouble of slow release cycles… too much to keep track of.

#7071 (early stopping for Gradient boosting) was probably too big a change for a minor release. Given how you pushed 0.19.1 forward and made it happen, I really don’t think there is anything you should be sorry about.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python sklearn GradientBoostingClassifier warm start error
I've used the model to train a classifier on a set of data with 1000 iterations: clf ...
Read more >
Risk of mortality and cardiopulmonary arrest in critical ... - NCBI
The remaining predictors were ranked an importance inferior to 50%. ... “Warm start” (reusing the solution of the previous call to fit as...
Read more >
Lab assignment: fraud detection through ensemble methods
In this assignment we will use all the skills in ensemble learning we acquire from previous exercises to build a an automated fraud ......
Read more >
arXiv:2111.14514v1 [cs.LG] 29 Nov 2021
creates a pipeline given some training set Dtrain ⊂ D. The performance of A is ... the state-of-the-art side, we compare solutions with...
Read more >
Learning to target advertisements at Spotify - DiVA Portal
ular, the proposed methodology uses uplift modeling, with a modified approach to handle bias in the training data, and also makes use of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found