question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GridSearchCV cannot be paralleled when custom scoring is used

See original GitHub issue

Hi,

I met a problem with the code:

    from sklearn.model_selection import GridSearchCV
    model = ensemble.RandomForestRegressor()
    param = {'n_estimators': [500, 700, 1200],
             'max_depth': [3, 5, 7],
             'max_features': ['auto'],
             'n_jobs': [-1],
             'criterion': ['mae', 'mse'],
             'random_state': [300],
             }
    from sklearn.metrics import make_scorer
    def my_custom_loss_func(ground_truth, predictions):
        diff = np.abs(ground_truth - predictions) / ground_truth
        return np.mean(diff)
    loss = make_scorer(my_custom_loss_func, greater_is_better=False)
    model_cv = GridSearchCV(model, param, cv=5, n_jobs=2, scoring=loss, verbose=1)
    model_cv.fit(X, y.ravel())

in which I used custom scoring object in GridSearchCV(…) and set n_jobs = 2.

I got the following error message:

C:\Anaconda3\python.exe C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py
Fitting 5 folds for each of 18 candidates, totalling 90 fits
Traceback (most recent call last):
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 172, in <module>
    models, scas = learn_all(X_train, y_train)
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 108, in learn_all
    models[machine], scas[machine] = learn_cv(X, y)
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 87, in learn_cv
    model_cv.fit(X, y.ravel())
  File "C:\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 638, in fit
    cv.split(X, y, groups)))
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 789, in __call__
    self.retrieve()
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Anaconda3\lib\multiprocessing\pool.py", line 608, in get
    raise self._value
  File "C:\Anaconda3\lib\multiprocessing\pool.py", line 385, in _handle_tasks
    put(task)
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\pool.py", line 371, in send
    CustomizablePickler(buffer, self._reducers).dump(obj)
AttributeError: Can't pickle local object 'learn_cv.<locals>.my_custom_loss_func'

Process finished with exit code 1

It seems that if and only if n_jobs is set to 1 can the program be run.

Any ideas?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:25 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
jnothmancommented, Dec 18, 2018

No need to open an issue first, @fx86

1reaction
amuellercommented, Oct 10, 2018

@fx86 whenever you can 😉 No rush

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python sklearn GridSearchCV: parallelization is not working
I encounter some problems using GridSerachCV from sklearn package and parallelization. To see if it was coming from my data, ...
Read more >
Why does gridsearchCV fit fail? - Data Science Stack Exchange
Therefore, I used gridsearchCV to identify the best parameters of balancedbagging classifier model to train/fit the model and then predict.
Read more >
3.2. Tuning the hyper-parameters of an estimator - Scikit-learn
The GridSearchCV instance implements the usual estimator API: when “fitting” it on a dataset all the possible combinations of parameter values are evaluated...
Read more >
Getting the Most out of scikit-learn Pipelines | by Jessica Miles
You can then pass this composite estimator to a GridSearchCV object and search ... We'll use CountVectorizer , and give it a custom...
Read more >
Find optimal parameters using GridSearchCV - ProjectPro
Scoring : It is used as a evaluating metric for the model performance to decide the best hyperparameters, if not especified then it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found