Got a warning when I use `Scikit-learn` and `Ray` with the `joblib` backend
See original GitHub issueWhat is the problem?
Ray: 1.1.0 Python: 3.8.5 OS: macOS Catalina 10.15.7 scikit-learn: 0.22.2.post1 optuna: 2.5.0
Thanks for the great project. I executed the example of sklearn
and joblib
in here and got the following warning.
2021-02-02 16:50:39,061 WARNING pool.py:340 -- The 'context' argument is not supported using ray. Please refer to the documentation for how to control ray initialization.
I did not provide a context
argument. The following screenshot shows the top part of the log when I run the example.
# quickstart_sklearn_and_ray_with_joblib.py
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
digits = load_digits()
param_space = {
'C': np.logspace(-6, 6, 30),
'gamma': np.logspace(-8, 8, 30),
'tol': np.logspace(-4, -1, 30),
'class_weight': [None, 'balanced'],
}
model = SVC(kernel='rbf')
search = RandomizedSearchCV(model, param_space, cv=5, n_iter=300, verbose=10)
import joblib
from ray.util.joblib import register_ray
register_ray()
with joblib.parallel_backend('ray'):
search.fit(digits.data, digits.target)

This warning is annoying when I use Ray with joblib
backend for integrating Optuna. Running the following code snippet will output a large number of warnings, as shown in the following screenshot.
import joblib
import optuna
import ray
from ray.util.joblib import register_ray
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection
import sklearn.svm
ray.init()
register_ray()
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])
if classifier_name == "SVC":
svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log=True)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma="auto")
else:
rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10
)
score = sklearn.model_selection.cross_val_score(classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
if __name__ == "__main__":
study = optuna.create_study(direction="maximize")
with joblib.parallel_backend("ray", n_jobs=-1):
study.optimize(objective, n_trials=100)

This warning can be disabled by ray.init(logging_level=logging.ERROR)
, but can it be controlled by modifying Ray?
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (10 by maintainers)
Top Results From Across the Web
Distributed Scikit-learn / Joblib — Ray 2.2.0
Ray supports running distributed scikit-learn programs by implementing a Ray backend for joblib using Ray Actors instead of local processes.
Read more >sklearn.utils.parallel_backend
Change the default backend used by Parallel inside a with block. If backend is a string it must ... To use the 'ray'...
Read more >Embarrassingly parallel for loops - Joblib - Read the Docs
The main issue with this solution is that using fork to start the process breaks the standard POSIX and can have weird interaction...
Read more >Optimizing Scikit-learn models using Ray backend - Medium
Representative code is shown in the following code blocks. Python Function Implementation with ray backend import joblibfrom ray.util.joblib ...
Read more >joblib Documentation - Read the Docs
Memory cache can get invalidated when upgrading joblib. ... To use the 'ray' joblib backend add the following lines:.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @xwu99 , the warning should appear once. In this case, joblib is using ray Pool under the hood, which does not support ‘context’ argument. The multiprocessing
pool
class supports an argumentcontext
that Ray does not. but this is not significant and in most cases you can ignore the warning.Looks perfect! Thanks for your swift actions.