question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproducibility problem with optuna.integration.OptunaSearchCV

See original GitHub issue

Problem in getting reproducible results using a specified seed. After consecutive trials the model_grid.best_params_ printed are different each time despite specifying a seed for the random_state argument

I expect to have reproducible results

  • Optuna version: ‘2.0.0’
  • Python version: Python 3.8.8
  • OS: PRETTY_NAME=“Ubuntu 18.04.3 LTS”, VERSION_ID=“18.04”
  • (Optional) Other libraries and their versions:

Error messages, stack traces, or logs

# error messages, stack traces, or logs

Steps to reproduce

  1. see script below

Reproducible examples (optional)


import optuna
import xgboost as xgb
from sklearn.datasets import load_boston

X, y = load_boston(return_X_y=True)

seed = 7
n_iter = 3
optimize = 'r2'
tuner_verbose = 0
n_jobs = -1
early_stopping_max_iters = 10
prune = False
fold = 3

regressor = xgb.XGBRegressor()

sampler = optuna.samplers.TPESampler(seed=seed)
study =  optuna.create_study(
            direction="maximize", sampler=sampler
        )

param_grid = {'eta':optuna.distributions.LogUniformDistribution(high=0.5, low=1e-06),
              'depth':optuna.distributions.IntUniformDistribution(high=11, low=1, step=1),
              'n_estimators':optuna.distributions.IntUniformDistribution(high=300, low=10, step=1),
              'random_strength':optuna.distributions.UniformDistribution(high=0.8, low=0),
              'l2_leaf_reg':optuna.distributions.IntLogUniformDistribution(high=200, low=1, step=1)
             }


model_grid = optuna.integration.OptunaSearchCV(
    estimator=regressor,
    param_distributions=param_grid,
    cv=fold,
    enable_pruning=prune,
    max_iter=early_stopping_max_iters,
    n_jobs=n_jobs,
    n_trials=n_iter,
    random_state=seed,
    scoring=optimize,
    study=study,
    refit=False,
    verbose=tuner_verbose,
    error_score="raise",
)

model_grid.fit(X, y)

print(model_grid.best_params_)

Additional context (optional)

After trial and error i have noticed that if i comment out the n_jobs, study and random_state arguments and use numpy.random.seed(7) i get reproducible results but its desired to use n_jobs=-1 rather than the default value of n_jobs=1

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
cconstantinou73commented, Mar 31, 2021

I cant seem to replicate the behaviour of inconsistency when n_jobs == 1 and defining a study and random_state, so i am assuming i might have done something wrong when using those before. In other words i get the expected behaviour of reproducibility when n_jobs == 1. Thank you for your responses @nzw0301

0reactions
nzw0301commented, Mar 31, 2021

Awesome! I’m glad to hear that. Thanks to your feedback, I’m sending a pull request to improve the documentation of n_jobs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

OptunaSearchCV does not support scikit-learn>=0.22.1 . #825
Reproducible code. >>> import sklearn >>> import optuna.integration.sklearn # The following line raises import error even if sklearn has ...
Read more >
optuna.integration.OptunaSearchCV - Read the Docs
If 'raise', the error is raised. If numeric, sklearn.exceptions.FitFailedWarning is raised. This does not affect the refit step, which will always raise the ......
Read more >
optuna.org - Gitter
I suppose I can again wrap the lightgbm with MultiOutputRegressor and use the optuna sklearn integration OptunaSearchCV? But what's the difference?
Read more >
Upgraded Marketing Mix Modeling in Python
I will use Optuna, an advanced library for optimization tasks. Among many other things, it offers a scikit-learn-compatible OptunaSearchCV class ...
Read more >
House Prices - Optimize Sklearn Models w/ Optuna | Kaggle
This can cause issues with some models (e.g. linear regression) so we use a log ... scikit-learn integration from optuna.integration import OptunaSearchCV.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found