Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using warmstart in GradientBoostingClassifier produces inferior solution

See original GitHub issue

Description

Using the warmstart flag for the GradientBoostingClassifier results in an inferior solution compared to fitting all base models at once. From how the docs regarding the warmstarting are written, I expect the results to be equal.

Steps/Code to Reproduce

import numpy as np
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection


X, y = sklearn.datasets.load_digits(return_X_y=True)
rs = np.random.RandomState(42)
indices = np.arange(X.shape[0])
rs.shuffle(indices)
indices = indices[:150]
X = X[indices]
y = y[indices]
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=16,
)
classifier = sklearn.ensemble.GradientBoostingClassifier(
    warm_start=True, n_estimators=1, random_state=1,
)
classifier.fit(X_train, y_train)
for i in range(99):
    classifier.n_estimators += 1
    classifier.fit(X_train, y_train)
print(classifier.score(X_test, y_test))

classifier = sklearn.ensemble.GradientBoostingClassifier(
    n_estimators=100, random_state=1,
)
classifier.fit(X_train, y_train)
print(classifier.score(X_test, y_test))

Expected Results

Two equal scores.

Actual Results

0.605263157895 0.657894736842

Versions

Linux-4.4.0-96-generic-x86_64-with-debian-stretch-sid Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] NumPy 1.13.3 SciPy 0.19.1 Scikit-Learn 0.19.1

Edit

The following models have the same issue:

ExtraTreesClassifier
SGDClassifier
PassiveAggressiveClassifier

The RandomForestClassifier seems to not have this problem and I did not test regression algorithms but expect them to behave the same.

Issue Analytics

State:
Created 6 years ago
Comments:12 (12 by maintainers)

Top GitHub Comments

1reaction

lestevecommented, Oct 26, 2017

🏆 Congrats for #10000 ! 🏆

Let’s just hope that all issues will be like #10000 from now on: a fully stand-alone snippet to reproduce the problem and closed under a day 😉 !

1reaction

lestevecommented, Oct 26, 2017

sorry if this was another fix I should have included in 0.19.1, but did not realise. This is the trouble of slow release cycles… too much to keep track of.

#7071 (early stopping for Gradient boosting) was probably too big a change for a minor release. Given how you pushed 0.19.1 forward and made it happen, I really don’t think there is anything you should be sorry about.