Reproducibility problem with optuna.integration.OptunaSearchCV
See original GitHub issueProblem in getting reproducible results using a specified seed. After consecutive trials the model_grid.best_params_ printed are different each time despite specifying a seed for the random_state argument
I expect to have reproducible results
- Optuna version: ‘2.0.0’
- Python version: Python 3.8.8
- OS: PRETTY_NAME=“Ubuntu 18.04.3 LTS”, VERSION_ID=“18.04”
- (Optional) Other libraries and their versions:
Error messages, stack traces, or logs
# error messages, stack traces, or logs
Steps to reproduce
- see script below
Reproducible examples (optional)
import optuna
import xgboost as xgb
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
seed = 7
n_iter = 3
optimize = 'r2'
tuner_verbose = 0
n_jobs = -1
early_stopping_max_iters = 10
prune = False
fold = 3
regressor = xgb.XGBRegressor()
sampler = optuna.samplers.TPESampler(seed=seed)
study = optuna.create_study(
direction="maximize", sampler=sampler
)
param_grid = {'eta':optuna.distributions.LogUniformDistribution(high=0.5, low=1e-06),
'depth':optuna.distributions.IntUniformDistribution(high=11, low=1, step=1),
'n_estimators':optuna.distributions.IntUniformDistribution(high=300, low=10, step=1),
'random_strength':optuna.distributions.UniformDistribution(high=0.8, low=0),
'l2_leaf_reg':optuna.distributions.IntLogUniformDistribution(high=200, low=1, step=1)
}
model_grid = optuna.integration.OptunaSearchCV(
estimator=regressor,
param_distributions=param_grid,
cv=fold,
enable_pruning=prune,
max_iter=early_stopping_max_iters,
n_jobs=n_jobs,
n_trials=n_iter,
random_state=seed,
scoring=optimize,
study=study,
refit=False,
verbose=tuner_verbose,
error_score="raise",
)
model_grid.fit(X, y)
print(model_grid.best_params_)
Additional context (optional)
After trial and error i have noticed that if i comment out the n_jobs, study and random_state arguments and use numpy.random.seed(7) i get reproducible results but its desired to use n_jobs=-1 rather than the default value of n_jobs=1
Issue Analytics
- State:
- Created 2 years ago
- Comments:7
Top Results From Across the Web
OptunaSearchCV does not support scikit-learn>=0.22.1 . #825
Reproducible code. >>> import sklearn >>> import optuna.integration.sklearn # The following line raises import error even if sklearn has ...
Read more >optuna.integration.OptunaSearchCV - Read the Docs
If 'raise', the error is raised. If numeric, sklearn.exceptions.FitFailedWarning is raised. This does not affect the refit step, which will always raise the ......
Read more >optuna.org - Gitter
I suppose I can again wrap the lightgbm with MultiOutputRegressor and use the optuna sklearn integration OptunaSearchCV? But what's the difference?
Read more >Upgraded Marketing Mix Modeling in Python
I will use Optuna, an advanced library for optimization tasks. Among many other things, it offers a scikit-learn-compatible OptunaSearchCV class ...
Read more >House Prices - Optimize Sklearn Models w/ Optuna | Kaggle
This can cause issues with some models (e.g. linear regression) so we use a log ... scikit-learn integration from optuna.integration import OptunaSearchCV.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I cant seem to replicate the behaviour of inconsistency when n_jobs == 1 and defining a study and random_state, so i am assuming i might have done something wrong when using those before. In other words i get the expected behaviour of reproducibility when n_jobs == 1. Thank you for your responses @nzw0301
Awesome! I’m glad to hear that. Thanks to your feedback, I’m sending a pull request to improve the documentation of
n_jobs
.