Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

use RandomizedSearchCV and estmator is RandomForestClassifier have bug?

See original GitHub issue

run code is about:

print(__doc__)

import numpy as np

from time import time
from scipy.stats import randint as sp_randint

# from sklearn.model_selection import GridSearchCV
# from sklearn.model_selection import RandomizedSearchCV
from dask_searchcv import GridSearchCV
from dask_searchcv import RandomizedSearchCV
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier

# get some data
digits = load_digits()
X, y = digits.data, digits.target

# build a classifier
clf = RandomForestClassifier(n_estimators=20)


# Utility function to report best scores
def report(results, n_top=3):
    for i in range(1, n_top + 1):
        candidates = np.flatnonzero(results['rank_test_score'] == i)
        for candidate in candidates:
            print("Model with rank: {0}".format(i))
            print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
                results['mean_test_score'][candidate],
                results['std_test_score'][candidate]))
            print("Parameters: {0}".format(results['params'][candidate]))
            print("")


# specify parameters and distributions to sample from
param_dist = {"max_depth": [3, None],
              "max_features": sp_randint(1, 11),
              "min_samples_split": sp_randint(2, 11),
              "min_samples_leaf": sp_randint(1, 11),
              "bootstrap": [True, False],
              "criterion": ["gini", "entropy"]}

# run randomized search
n_iter_search = 20
random_search = RandomizedSearchCV(clf, param_distributions=param_dist,
                                   n_iter=n_iter_search)

start = time()
random_search.fit(X, y)
print("RandomizedSearchCV took %.2f seconds for %d candidates"
      " parameter settings." % ((time() - start), n_iter_search))
report(random_search.cv_results_)

# use a full grid over all parameters
param_grid = {"max_depth": [3, None],
              "max_features": [1, 3, 10],
              "min_samples_split": [2, 3, 10],
              "min_samples_leaf": [1, 3, 10],
              "bootstrap": [True, False],
              "criterion": ["gini", "entropy"]}

# run grid search
grid_search = GridSearchCV(clf, param_grid=param_grid)
start = time()
grid_search.fit(X, y)

print("GridSearchCV took %.2f seconds for %d candidate parameter settings."
      % (time() - start, len(grid_search.cv_results_['params'])))
report(grid_search.cv_results_)

the error is :

Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-139-ecec65d20380>", line 50, in <module>
    random_search.fit(X, y)
  File "/root/anaconda3/lib/python3.6/site-packages/dask_searchcv/model_selection.py", line 867, in fit
    out = scheduler(dsk, keys, num_workers=n_jobs)
  File "/root/anaconda3/lib/python3.6/site-packages/dask/threaded.py", line 75, in get
    pack_exception=pack_exception, **kwargs)
  File "/root/anaconda3/lib/python3.6/site-packages/dask/local.py", line 521, in get_async
    raise_exception(exc, tb)
  File "/root/anaconda3/lib/python3.6/site-packages/dask/compatibility.py", line 60, in reraise
    raise exc
File "/root/anaconda3/lib/python3.6/site-packages/dask/local.py", line 290, in execute_task
    result = _execute_task(task, data)
  File "/root/anaconda3/lib/python3.6/site-packages/dask/local.py", line 271, in _execute_task
    return func(*args2)
  File "/root/anaconda3/lib/python3.6/site-packages/dask_searchcv/methods.py", line 280, in fit_and_score
    fields, params, fit_params)
  File "/root/anaconda3/lib/python3.6/site-packages/dask_searchcv/methods.py", line 216, in fit
    est.fit(X, y, **fit_params)
  File "/root/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py", line 316, in fit
    random_state=random_state)
  File "/root/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/base.py", line 125, in _make_estimator
    estimator = clone(self.base_estimator_)
  File "/root/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 60, in clone
    new_object_params = estimator.get_params(deep=False)
  File "/root/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 241, in get_params
    warnings.filters.pop(0)
IndexError: pop from empty list

Issue Analytics

State:
Created 5 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

TomAugspurgercommented, Jul 3, 2018

Probably not. I think that 0.20 is reasonably close, and scikit-learn doesn’t really have enough development resources to maintain backport branches.

0reactions

wtbarnescommented, Jul 3, 2018

Ok I can confirm that the above example with GridSearchCV works if I use scikit-learn 0.20.dev, i.e. the master branch. Is it worth creating an issue in scikit-learn to ask for a backport of this fix to 0.19?

Top Results From Across the Web

RandomSearchCV super slow - troubleshooting performance ...

The only way to speed this up is to 1) reduce the features or/and use more CPU cores n_jobs = -1 : bestforest...

Random Forests in python: OOB Estimate/Score close to 0, but ...

Turns out this was bug with the software, sklearn 0.14. I got the most recent version (0.19) and now my OOB score is...

Version 0.19.2 — scikit-learn 1.2.0 documentation

Fixed a bug where parallelised prediction in random forests was not thread-safe and ... VotingClassifier now allows changing estimators by using ensemble.

Hyperparameter Tuning the Random Forest in Python

(The parameters of a random forest are the variables and ... To use RandomizedSearchCV, we first need to create a parameter grid to...

Tune Hyperparameters with Randomized Search

The model we tune using random search will be a random forest classifier. We will specify three different types of parameter distribution to ......