question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Got a warning when I use `Scikit-learn` and `Ray` with the `joblib` backend

See original GitHub issue

What is the problem?

Ray: 1.1.0 Python: 3.8.5 OS: macOS Catalina 10.15.7 scikit-learn: 0.22.2.post1 optuna: 2.5.0

Thanks for the great project. I executed the example of sklearn and joblib in here and got the following warning.

2021-02-02 16:50:39,061 WARNING pool.py:340 -- The 'context' argument is not supported using ray. Please refer to the documentation for how to control ray initialization.

I did not provide a context argument. The following screenshot shows the top part of the log when I run the example.

# quickstart_sklearn_and_ray_with_joblib.py
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
digits = load_digits()
param_space = {
    'C': np.logspace(-6, 6, 30),
    'gamma': np.logspace(-8, 8, 30),
    'tol': np.logspace(-4, -1, 30),
    'class_weight': [None, 'balanced'],
}
model = SVC(kernel='rbf')
search = RandomizedSearchCV(model, param_space, cv=5, n_iter=300, verbose=10)

import joblib
from ray.util.joblib import register_ray
register_ray()
with joblib.parallel_backend('ray'):
    search.fit(digits.data, digits.target)

スクリーンショット 2021-02-02 17 01 58

This warning is annoying when I use Ray with joblib backend for integrating Optuna. Running the following code snippet will output a large number of warnings, as shown in the following screenshot.

import joblib
import optuna
import ray
from ray.util.joblib import register_ray
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection
import sklearn.svm


ray.init()
register_ray()


def objective(trial):
    iris = sklearn.datasets.load_iris()
    x, y = iris.data, iris.target

    classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])
    if classifier_name == "SVC":
        svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log=True)
        classifier_obj = sklearn.svm.SVC(C=svc_c, gamma="auto")
    else:
        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
        classifier_obj = sklearn.ensemble.RandomForestClassifier(
            max_depth=rf_max_depth, n_estimators=10
        )

    score = sklearn.model_selection.cross_val_score(classifier_obj, x, y, n_jobs=-1, cv=3)
    accuracy = score.mean()
    return accuracy


if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    with joblib.parallel_backend("ray", n_jobs=-1):
        study.optimize(objective, n_trials=100)
スクリーンショット 2021-02-02 17 18 57

This warning can be disabled by ray.init(logging_level=logging.ERROR), but can it be controlled by modifying Ray?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
AmeerHajAlicommented, Feb 25, 2022

Hi @xwu99 , the warning should appear once. In this case, joblib is using ray Pool under the hood, which does not support ‘context’ argument. The multiprocessing pool class supports an argument context that Ray does not. but this is not significant and in most cases you can ignore the warning.

1reaction
HideakiImamuracommented, Feb 3, 2021

I made a fix here, here is how the new output looks like:

Looks perfect! Thanks for your swift actions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Distributed Scikit-learn / Joblib — Ray 2.2.0
Ray supports running distributed scikit-learn programs by implementing a Ray backend for joblib using Ray Actors instead of local processes.
Read more >
sklearn.utils.parallel_backend
Change the default backend used by Parallel inside a with block. If backend is a string it must ... To use the 'ray'...
Read more >
Embarrassingly parallel for loops - Joblib - Read the Docs
The main issue with this solution is that using fork to start the process breaks the standard POSIX and can have weird interaction...
Read more >
Optimizing Scikit-learn models using Ray backend - Medium
Representative code is shown in the following code blocks. Python Function Implementation with ray backend import joblibfrom ray.util.joblib ...
Read more >
joblib Documentation - Read the Docs
Memory cache can get invalidated when upgrading joblib. ... To use the 'ray' joblib backend add the following lines:.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found