use RandomizedSearchCV and estmator is RandomForestClassifier have bug?
See original GitHub issuerun code is about:
print(__doc__)
import numpy as np
from time import time
from scipy.stats import randint as sp_randint
# from sklearn.model_selection import GridSearchCV
# from sklearn.model_selection import RandomizedSearchCV
from dask_searchcv import GridSearchCV
from dask_searchcv import RandomizedSearchCV
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
# get some data
digits = load_digits()
X, y = digits.data, digits.target
# build a classifier
clf = RandomForestClassifier(n_estimators=20)
# Utility function to report best scores
def report(results, n_top=3):
for i in range(1, n_top + 1):
candidates = np.flatnonzero(results['rank_test_score'] == i)
for candidate in candidates:
print("Model with rank: {0}".format(i))
print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
results['mean_test_score'][candidate],
results['std_test_score'][candidate]))
print("Parameters: {0}".format(results['params'][candidate]))
print("")
# specify parameters and distributions to sample from
param_dist = {"max_depth": [3, None],
"max_features": sp_randint(1, 11),
"min_samples_split": sp_randint(2, 11),
"min_samples_leaf": sp_randint(1, 11),
"bootstrap": [True, False],
"criterion": ["gini", "entropy"]}
# run randomized search
n_iter_search = 20
random_search = RandomizedSearchCV(clf, param_distributions=param_dist,
n_iter=n_iter_search)
start = time()
random_search.fit(X, y)
print("RandomizedSearchCV took %.2f seconds for %d candidates"
" parameter settings." % ((time() - start), n_iter_search))
report(random_search.cv_results_)
# use a full grid over all parameters
param_grid = {"max_depth": [3, None],
"max_features": [1, 3, 10],
"min_samples_split": [2, 3, 10],
"min_samples_leaf": [1, 3, 10],
"bootstrap": [True, False],
"criterion": ["gini", "entropy"]}
# run grid search
grid_search = GridSearchCV(clf, param_grid=param_grid)
start = time()
grid_search.fit(X, y)
print("GridSearchCV took %.2f seconds for %d candidate parameter settings."
% (time() - start, len(grid_search.cv_results_['params'])))
report(grid_search.cv_results_)
the error is :
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-139-ecec65d20380>", line 50, in <module>
random_search.fit(X, y)
File "/root/anaconda3/lib/python3.6/site-packages/dask_searchcv/model_selection.py", line 867, in fit
out = scheduler(dsk, keys, num_workers=n_jobs)
File "/root/anaconda3/lib/python3.6/site-packages/dask/threaded.py", line 75, in get
pack_exception=pack_exception, **kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/dask/local.py", line 521, in get_async
raise_exception(exc, tb)
File "/root/anaconda3/lib/python3.6/site-packages/dask/compatibility.py", line 60, in reraise
raise exc
File "/root/anaconda3/lib/python3.6/site-packages/dask/local.py", line 290, in execute_task
result = _execute_task(task, data)
File "/root/anaconda3/lib/python3.6/site-packages/dask/local.py", line 271, in _execute_task
return func(*args2)
File "/root/anaconda3/lib/python3.6/site-packages/dask_searchcv/methods.py", line 280, in fit_and_score
fields, params, fit_params)
File "/root/anaconda3/lib/python3.6/site-packages/dask_searchcv/methods.py", line 216, in fit
est.fit(X, y, **fit_params)
File "/root/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py", line 316, in fit
random_state=random_state)
File "/root/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/base.py", line 125, in _make_estimator
estimator = clone(self.base_estimator_)
File "/root/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 60, in clone
new_object_params = estimator.get_params(deep=False)
File "/root/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 241, in get_params
warnings.filters.pop(0)
IndexError: pop from empty list
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (10 by maintainers)
Top Results From Across the Web
RandomSearchCV super slow - troubleshooting performance ...
The only way to speed this up is to 1) reduce the features or/and use more CPU cores n_jobs = -1 : bestforest...
Read more >Random Forests in python: OOB Estimate/Score close to 0, but ...
Turns out this was bug with the software, sklearn 0.14. I got the most recent version (0.19) and now my OOB score is...
Read more >Version 0.19.2 — scikit-learn 1.2.0 documentation
Fixed a bug where parallelised prediction in random forests was not thread-safe and ... VotingClassifier now allows changing estimators by using ensemble.
Read more >Hyperparameter Tuning the Random Forest in Python
(The parameters of a random forest are the variables and ... To use RandomizedSearchCV, we first need to create a parameter grid to...
Read more >Tune Hyperparameters with Randomized Search
The model we tune using random search will be a random forest classifier. We will specify three different types of parameter distribution to ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Probably not. I think that 0.20 is reasonably close, and scikit-learn doesn’t really have enough development resources to maintain backport branches.
Ok I can confirm that the above example with
GridSearchCV
works if I use scikit-learn0.20.dev
, i.e. the master branch. Is it worth creating an issue in scikit-learn to ask for a backport of this fix to 0.19?