Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproducibility problem with optuna.integration.OptunaSearchCV

See original GitHub issue

Problem in getting reproducible results using a specified seed. After consecutive trials the model_grid.best_params_ printed are different each time despite specifying a seed for the random_state argument

I expect to have reproducible results

Optuna version: ‘2.0.0’
Python version: Python 3.8.8
OS: PRETTY_NAME=“Ubuntu 18.04.3 LTS”, VERSION_ID=“18.04”
(Optional) Other libraries and their versions:

Error messages, stack traces, or logs

# error messages, stack traces, or logs

Steps to reproduce

see script below

Reproducible examples (optional)


import optuna
import xgboost as xgb
from sklearn.datasets import load_boston

X, y = load_boston(return_X_y=True)

seed = 7
n_iter = 3
optimize = 'r2'
tuner_verbose = 0
n_jobs = -1
early_stopping_max_iters = 10
prune = False
fold = 3

regressor = xgb.XGBRegressor()

sampler = optuna.samplers.TPESampler(seed=seed)
study =  optuna.create_study(
            direction="maximize", sampler=sampler
        )

param_grid = {'eta':optuna.distributions.LogUniformDistribution(high=0.5, low=1e-06),
              'depth':optuna.distributions.IntUniformDistribution(high=11, low=1, step=1),
              'n_estimators':optuna.distributions.IntUniformDistribution(high=300, low=10, step=1),
              'random_strength':optuna.distributions.UniformDistribution(high=0.8, low=0),
              'l2_leaf_reg':optuna.distributions.IntLogUniformDistribution(high=200, low=1, step=1)
             }


model_grid = optuna.integration.OptunaSearchCV(
    estimator=regressor,
    param_distributions=param_grid,
    cv=fold,
    enable_pruning=prune,
    max_iter=early_stopping_max_iters,
    n_jobs=n_jobs,
    n_trials=n_iter,
    random_state=seed,
    scoring=optimize,
    study=study,
    refit=False,
    verbose=tuner_verbose,
    error_score="raise",
)

model_grid.fit(X, y)

print(model_grid.best_params_)

Additional context (optional)

After trial and error i have noticed that if i comment out the n_jobs, study and random_state arguments and use numpy.random.seed(7) i get reproducible results but its desired to use n_jobs=-1 rather than the default value of n_jobs=1

Issue Analytics

State:
Created 2 years ago
Comments:7

Top GitHub Comments

1reaction

cconstantinou73commented, Mar 31, 2021

I cant seem to replicate the behaviour of inconsistency when n_jobs == 1 and defining a study and random_state, so i am assuming i might have done something wrong when using those before. In other words i get the expected behaviour of reproducibility when n_jobs == 1. Thank you for your responses @nzw0301

0reactions

nzw0301commented, Mar 31, 2021

Awesome! I’m glad to hear that. Thanks to your feedback, I’m sending a pull request to improve the documentation of n_jobs.

Top Results From Across the Web

OptunaSearchCV does not support scikit-learn>=0.22.1 . #825

Reproducible code. >>> import sklearn >>> import optuna.integration.sklearn # The following line raises import error even if sklearn has ...

optuna.integration.OptunaSearchCV - Read the Docs

If 'raise', the error is raised. If numeric, sklearn.exceptions.FitFailedWarning is raised. This does not affect the refit step, which will always raise the ......

optuna.org - Gitter

I suppose I can again wrap the lightgbm with MultiOutputRegressor and use the optuna sklearn integration OptunaSearchCV? But what's the difference?

Upgraded Marketing Mix Modeling in Python

I will use Optuna, an advanced library for optimization tasks. Among many other things, it offers a scikit-learn-compatible OptunaSearchCV class ...

House Prices - Optimize Sklearn Models w/ Optuna | Kaggle

This can cause issues with some models (e.g. linear regression) so we use a log ... scikit-learn integration from optuna.integration import OptunaSearchCV.