Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GridSearchCV cannot be paralleled when custom scoring is used

See original GitHub issue

Hi,

I met a problem with the code:

    from sklearn.model_selection import GridSearchCV
    model = ensemble.RandomForestRegressor()
    param = {'n_estimators': [500, 700, 1200],
             'max_depth': [3, 5, 7],
             'max_features': ['auto'],
             'n_jobs': [-1],
             'criterion': ['mae', 'mse'],
             'random_state': [300],
             }
    from sklearn.metrics import make_scorer
    def my_custom_loss_func(ground_truth, predictions):
        diff = np.abs(ground_truth - predictions) / ground_truth
        return np.mean(diff)
    loss = make_scorer(my_custom_loss_func, greater_is_better=False)
    model_cv = GridSearchCV(model, param, cv=5, n_jobs=2, scoring=loss, verbose=1)
    model_cv.fit(X, y.ravel())

in which I used custom scoring object in GridSearchCV(…) and set n_jobs = 2.

I got the following error message:

C:\Anaconda3\python.exe C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py
Fitting 5 folds for each of 18 candidates, totalling 90 fits
Traceback (most recent call last):
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 172, in <module>
    models, scas = learn_all(X_train, y_train)
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 108, in learn_all
    models[machine], scas[machine] = learn_cv(X, y)
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 87, in learn_cv
    model_cv.fit(X, y.ravel())
  File "C:\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 638, in fit
    cv.split(X, y, groups)))
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 789, in __call__
    self.retrieve()
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Anaconda3\lib\multiprocessing\pool.py", line 608, in get
    raise self._value
  File "C:\Anaconda3\lib\multiprocessing\pool.py", line 385, in _handle_tasks
    put(task)
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\pool.py", line 371, in send
    CustomizablePickler(buffer, self._reducers).dump(obj)
AttributeError: Can't pickle local object 'learn_cv.<locals>.my_custom_loss_func'

Process finished with exit code 1

It seems that if and only if n_jobs is set to 1 can the program be run.

Any ideas?

Issue Analytics

State:
Created 6 years ago
Comments:25 (19 by maintainers)

Top GitHub Comments

1reaction

jnothmancommented, Dec 18, 2018

No need to open an issue first, @fx86

1reaction

amuellercommented, Oct 10, 2018

@fx86 whenever you can 😉 No rush

Top Results From Across the Web

Python sklearn GridSearchCV: parallelization is not working

I encounter some problems using GridSerachCV from sklearn package and parallelization. To see if it was coming from my data, ...

Why does gridsearchCV fit fail? - Data Science Stack Exchange

Therefore, I used gridsearchCV to identify the best parameters of balancedbagging classifier model to train/fit the model and then predict.

3.2. Tuning the hyper-parameters of an estimator - Scikit-learn

The GridSearchCV instance implements the usual estimator API: when “fitting” it on a dataset all the possible combinations of parameter values are evaluated...

Getting the Most out of scikit-learn Pipelines | by Jessica Miles

You can then pass this composite estimator to a GridSearchCV object and search ... We'll use CountVectorizer , and give it a custom...

Find optimal parameters using GridSearchCV - ProjectPro

Scoring : It is used as a evaluating metric for the model performance to decide the best hyperparameters, if not especified then it...