question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AssertionError using multivariate TPE sampler

See original GitHub issue

Expected behavior

Script was running normally. Completed trial 18, but then got an AssertionError. This never happened before while running the same script. I previously ran the same script for 36 trials until it stopped due to a CUDA out of memory error, but that’s not relevant here.

Environment

  • Optuna version: 2.5.0
  • Python version: 3.8.5
  • OS: Red Hat Enterprise Linux
  • (Optional) Other libraries and their versions:
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
alembic                   1.5.4                     <pip>
attrs                     20.3.0                    <pip>
blas                      1.0                         mkl
ca-certificates           2020.12.5            ha878542_0    conda-forge
certifi                   2020.12.5        py38h578d9bd_1    conda-forge
cliff                     3.6.0                     <pip>
cmaes                     0.8.0                     <pip>
cmd2                      1.5.0                     <pip>
colorama                  0.4.4                     <pip>
colorlog                  4.7.2                     <pip>
cudatoolkit               10.2.89              hfd86e86_1
freetype                  2.10.4               h5ab3b9f_0
intel-openmp              2020.2                      254
joblib                    1.0.0              pyhd8ed1ab_0    conda-forge
jpeg                      9b                   h024ee3a_2
lcms2                     2.11                 h396b838_0
ld_impl_linux-64          2.33.1               h53a641e_7
libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge
libcblas                  3.9.0           3_h92ddd45_netlib    conda-forge
libedit                   3.1.20191231         h14c3975_1
libffi                    3.3                  he6710b0_2
libgcc-ng                 9.3.0               h5dbcf3e_17    conda-forge
libgfortran-ng            9.3.0               he4bcb1c_17    conda-forge
libgfortran5              9.3.0               he4bcb1c_17    conda-forge
libgomp                   9.3.0               h5dbcf3e_17    conda-forge
liblapack                 3.9.0           3_h92ddd45_netlib    conda-forge
libpng                    1.6.37               hbc83047_0
libstdcxx-ng              9.3.0               h2ae2ef3_17    conda-forge
libtiff                   4.1.0                h2733197_1
libuv                     1.40.0               h7b6447c_0
lz4-c                     1.9.2                heb0550a_3
Mako                      1.1.4                     <pip>
MarkupSafe                1.1.1                     <pip>
mkl                       2020.2                      256
mkl-service               2.3.0            py38he904b0f_0
mkl_fft                   1.2.0            py38h23d657b_0
mkl_random                1.1.1            py38h0573a6f_0
ncurses                   6.2                  he6710b0_1
ninja                     1.10.2           py38hff7bd54_0
numpy                     1.19.2           py38h54aff64_0
numpy-base                1.19.2           py38hfa32c7d_0
olefile                   0.46                       py_0
openssl                   1.1.1i               h7f98852_0    conda-forge
optuna                    2.5.0                     <pip>
packaging                 20.9                      <pip>
pandas                    1.2.0            py38ha9443f7_0
pbr                       5.5.1                     <pip>
pillow                    8.1.0            py38he98fc37_0
pip                       20.3.3           py38h06a4308_0
prettytable               0.7.2                     <pip>
pyparsing                 2.4.7                     <pip>
pyperclip                 1.8.1                     <pip>
python                    3.8.5                h7579374_1
python-dateutil           2.8.1                      py_0
python-editor             1.0.4                     <pip>
python_abi                3.8                      1_cp38    conda-forge
pytorch                   1.7.1           py3.8_cuda10.2.89_cudnn7.6.5_0    pytorch
pytz                      2020.5             pyhd3eb1b0_0
PyYAML                    5.4.1                     <pip>
readline                  8.0                  h7b6447c_0
scikit-learn              0.24.0           py38h658cfdd_0    conda-forge
scipy                     1.6.0            py38hb2138dd_0    conda-forge
setuptools                51.0.0           py38h06a4308_2
six                       1.15.0           py38h06a4308_0
SQLAlchemy                1.3.23                    <pip>
sqlite                    3.33.0               h62c20be_0
stevedore                 3.3.0                     <pip>
threadpoolctl             2.1.0              pyh5ca1d4c_0    conda-forge
tk                        8.6.10               hbc83047_0
torchaudio                0.7.2                      py38    pytorch
torchvision               0.8.2                py38_cu102    pytorch
tqdm                      4.56.0                    <pip>
typing_extensions         3.7.4.3                    py_0
wcwidth                   0.2.5                     <pip>
wheel                     0.36.2             pyhd3eb1b0_0
xz                        5.2.5                h7b6447c_0
zlib                      1.2.11               h7b6447c_3
zstd                      1.4.5                h9ceee32_0

Error messages, stack traces, or logs

Here is everything starting from trial 15:

[I 2021-02-05 07:29:35,802] Trial 15 pruned.

Average Validation Loss: 2594.3800

Epoch 1/5

Average Training loss: 11902.6517

Average Validation Loss: 10585.8529

Epoch 2/5

Average Training loss: 8536.1782
[I 2021-02-05 07:44:59,435] Trial 16 pruned.

Average Validation Loss: 6230.1825

Epoch 1/5

Average Training loss: 605.1005

Average Validation Loss: 532.8016

Epoch 2/5

Average Training loss: 526.2966

Average Validation Loss: 524.1667

Epoch 3/5

Average Training loss: 524.6881

Average Validation Loss: 526.1336

Epoch 4/5

Average Training loss: 524.3555
[I 2021-02-05 08:16:04,120] Trial 17 pruned.

Average Validation Loss: 523.4189

Epoch 1/5

Average Training loss: 1805.4577

Average Validation Loss: 1617.5258

Epoch 2/5

Average Training loss: 1586.4583

Average Validation Loss: 1567.8128

Epoch 3/5

Average Training loss: 1567.9229

Average Validation Loss: 1565.4937

Epoch 4/5

Average Training loss: 1566.8196

Average Validation Loss: 1564.4235

Epoch 5/5

Average Training loss: 1565.9849
[I 2021-02-05 08:54:46,640] Trial 18 finished with value: 1565.4634710145326 and parameters: {'num_channels': 64, 'optimizer': 'RMSprop', 'lr': 2.618149026594371e-05, 'momentum': 0.8016042722596308, 'rmsprop_alpha': 0.902569807347318, 'batch_size': 8, 'loss_func_alpha': 3.0}. Best is trial 9 with value: 519.5251754050875.

Average Validation Loss: 1565.4635
Traceback (most recent call last):
  File "recon_hp_tuning.py", line 258, in <module>
    study.optimize(objective,
  File "/afs/unity.ncsu.edu/users/m/maabdelk/.conda/envs/local/lib/python3.8/site-packages/optuna/study.py", line 376, in optimize
  File "/afs/unity.ncsu.edu/users/m/maabdelk/.conda/envs/local/lib/python3.8/site-packages/optuna/_optimize.py", line 63, in _optimize
  File "/afs/unity.ncsu.edu/users/m/maabdelk/.conda/envs/local/lib/python3.8/site-packages/optuna/_optimize.py", line 164, in _optimize_sequential
  File "/afs/unity.ncsu.edu/users/m/maabdelk/.conda/envs/local/lib/python3.8/site-packages/optuna/_optimize.py", line 191, in _run_trial
  File "/afs/unity.ncsu.edu/users/m/maabdelk/.conda/envs/local/lib/python3.8/site-packages/optuna/study.py", line 421, in ask
  File "/afs/unity.ncsu.edu/users/m/maabdelk/.conda/envs/local/lib/python3.8/site-packages/optuna/trial/_trial.py", line 57, in __init__
  File "/afs/unity.ncsu.edu/users/m/maabdelk/.conda/envs/local/lib/python3.8/site-packages/optuna/trial/_trial.py", line 66, in _init_relative_params
  File "/afs/unity.ncsu.edu/users/m/maabdelk/.conda/envs/local/lib/python3.8/site-packages/optuna/samplers/_tpe/sampler.py", line 238, in sample_relative
  File "/afs/unity.ncsu.edu/users/m/maabdelk/.conda/envs/local/lib/python3.8/site-packages/optuna/samplers/_tpe/sampler.py", line 830, in _get_multivariate_observation_pairs
AssertionError

The AssertionError happens on line 830 here:

https://github.com/optuna/optuna/blob/1d3b3955fa41181b0a59f4e7b7be21a841ee9153/optuna/samplers/_tpe/sampler.py#L827-L834

Steps to reproduce

I don’t think I can reproduce this since it was random.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jeromepatelcommented, Mar 11, 2021

I think this issue can be closed as the corresponding PR is merged.

1reaction
y0zcommented, Feb 5, 2021

In brief, you have three options.

  1. turn off multivariate (the easiest workaround)
  2. turn off pruning
  3. cherry-pick the commits to fix this bug from #2055 (if the code works)
Read more comments on GitHub >

github_iconTop Results From Across the Web

optuna.samplers.TPESampler - Read the Docs
The sampling algorithm decomposes the search space based on past trials and samples from the joint distribution in each decomposed subspace.
Read more >
sigopt.Connection Example - Program Talk
None except(AssertionError): logger.error("`sigopt_api_token` field in yaml ... default TPE option gets better pareto-front instead of multivariate=True, ...
Read more >
vocab.txt - Hugging Face
... op number pos ##ap writ whi use code match ##act conn curr ##rect isinstance ... ##ible container mask password comput yield sample...
Read more >
"Multivariate" TPE Makes Optuna Even More Powerful
# We use the multivariate TPE sampler. sampler = optuna.samplers.TPESampler(multivariate = True ).
Read more >
wordlist_with_underscores.txt - Index of /
... grabusercontest cpuid openerp-wsgi spyglass-sample shuju aws-java-sdk-efs ... ciphers 50-its-google-trends lyrics using__main__ 5th deadlands sess_test ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found