Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TPE performs poorly compared to the TPE in BOHB

See original GitHub issue

Environment and experiments

python3.8
Ubuntu18.04
optuna2.8.0

First, I will show the comparisons of TPEs in my repo which is based on BOHB implementation and the TPE in Optuna v2.8.0.

Method	10D Rosenbrock ( $-5\leq x_i \leq 5$ )	10D Sphere ( $-5\leq x_i \leq 5$ )
Optuna TPE	$4697.90 \pm 1756.70$	$10.84 \pm 2.69$
BOHB TPE	$163.46 \pm 179.60$	$0.4025 \pm 0.33$
Random Search	$16783.1095 \pm 6523.66$	$27.09 \pm 8.18$

Each experiment is performed 10 times using different random seeds and each run uses 100 evaluations including 10 random initial evaluations.

NOTE I roughly checked the performance on ackley as well and BOHB outperformed. I think it is worth trying Griewank, Michalewicz, Rastrigin, Schwefel, Xin-she yang, Styblinski-Tang as well. All the functions are already available here and the searching domain is defined here.

Experiment code

For Optuna

import optuna
from optuna.samplers import TPESampler


V, dim = 5, 10
def sphere(**kwargs):
    val = 0
    for x in kwargs.values():
        val += x ** 2
    return val

def rosen(**kwargs):
    val = 0
    xs = list(kwargs.values())
    for d in range(dim - 1):
        t1 = 100 * (xs[d + 1] - xs[d] ** 2) ** 2
        t2 = (xs[d] - 1) ** 2
        val += t1 + t2
    return val

def func(trial):
    val = 0
    xs = {f'x{d}': trial.suggest_uniform(f'x{d}', -V, V) for d in range(dim)}
    return rosen(**xs)  # or sphere(**xs)

study = optuna.create_study(sampler=TPESampler(multivariate=True))
study.optimize(func, n_trials=100)

For BOHB,

# Repeat them 10 times 
$ python mvtpe_main.py -fuc sphere -dim 10 -eva 100 -ini 10
$ python mvtpe_main.py -fuc rosenbrock -dim 10 -eva 100 -ini 10

Why?

Intrinsically, TPE is a local search method and the performance is highly sensitive to the selection of bandwidth. If I understand correctly, the Optuna implementation fixes the bandwidth factor sigma0 over all the dimensions. However, I am not sure if this is a good strategy here because of the following two reasons:

Each dimension has different densities of observations (some dimensions might be packed densely while others not)
Low intrinsic dimensionality (It is likely to yield less packed density in unimportant dimensions)

Based on my observations, the bandwidth in the Optuna TPE is a bit large and thus the searching is quite close to random search. Although the KDE ratio will let you know which set is the best among the sampled sets, the choices of sets are combinatorial and thus usually it is hard to cover good sets with a small number of samples.

Note that since shorter bandwidth leads to more exploitative searching, it is often effective to introduce a regularizer such as mutation as in genetic algorithm.

The lines of doubt

In this report, I only focused on multivariate TPE.

Issue Analytics

State:
Created 2 years ago
Reactions:8
Comments:13 (3 by maintainers)

Top GitHub Comments

2reactions

nabenabe0928commented, Oct 14, 2022

Hi @not522 ,

I checked those papers before:

Scott, D.W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization.
Berwin, A.T. (1993) Bandwidth Selection in Kernel Density Estimation: A Review. (page 12)
Nils, B.H, (2013) Bandwidth selection for kernel density estimation: a review of fully automatic selector
Wolfgang, H (2005) Nonparametric and Semiparametric Models

But please keep in mind that those methods assume that observations are sampled from i.i.d., which is not the case for TPE, and that is why bandwidth is typically underestimated because of the local search nature of TPE.

Also, you might have noticed in the v3.0 release, but the multivariate TPE doesn’t work well in high dimensions and benchmark functions. There are several reasons for this:

Prior might dominate the density ratio values in high dimensions because of the data scarcity
The kernel is the product of density values in each dimension and thus most samples take almost zero values in high dimensions
The suitable Cardinality varies depending on tasks; in other words, for example, magic clip lower-bounds the bandwidth with 0.01 * domain size, but this works if |f(x+dx)-f(x)| is dominated by the observation noise. More specifically, when you define a range of dropout rate as [0,1], the variation from 0.4 to 0.41 makes no difference (due to the noise) and that from 0.4 to 0.6 might make difference. It implies that the cardinality of this dimension could be the scale of 0.1,0.2. (Obviously, this value must be much lower for benchmark functions and thus if we focus on the practical usage of TPE, we must expect somewhat low performance in benchmark functions)

Other things I noticed from prior experiences are that:

TPE typically yields the lowest possible bandwidth, i.e. the magic clipped bandwidth, and thus l(x) wouldn’t sample something new except from prior,
TPE searches locally due to the PI (binary classification) acquisition function nature and this leads to high performance, but it makes longer sampling almost a waste. It implies it’s better to re-start using some kind of threshold (e.g. the lower group hasn’t changed for X iterations)
Many observations do not necessarily help us guide to correct directions and it could even make the search worse. I am closely checking this part now.

I might add some more comments later or maybe not. But please feel free to ask me anything unclear and then I will surely answer those questions.

1reaction

nabenabe0928commented, Oct 30, 2022

@not522 I checked my implementation and the Optuna implementation on some high-dimensional (50d) benchmark functions with n_trials = 200.

I describe findings here:

Mine and Optuna implementation yield more or less similar performance in univariate settings
Multivariate is much better in my implementation and does not exhibit poor performance compared to univariate except on Schwefel function

NOTE Unlike the Optuna implementation, my implementation uses the same bandwidth selection for both univariate and multivariate and thus we just know that multivariate could be better unless the objective function is highly multimodal.

My implementation has several differences from the Optuna implementation, but I found out that separate bandwidth selection for each dimension made difference in the performance. This point was already mentioned in the very first post, but as I was expecting that mine also gets worse performance in higher dimensions, I report the results here.

For example, the results on 50d Griewank (the mean of the cumulated minimum on each seed)

Mine with univariate

[1419.83554381 1288.82736288 1273.8679245  1219.57426787 1219.57426787
 1193.18984928 1193.18984928 1189.13304909 1189.13304909 1189.09561218
 1189.09561218 1187.35356078 1184.03364993 1184.03364993 1184.03364993
 1184.03364993 1184.03364993 1184.03364993 1183.98787949 1183.98787949
 1183.98787949 1163.87112414 1163.87112414 1160.12543665 1149.96855754
 1146.80574128 1139.59898019 1126.06406252 1114.06334199 1092.63237176
 1088.49150127 1063.33735546 1057.97024956 1039.75946685 1000.60984564
  961.20108809  932.81153153  925.14175845  895.75109544  886.46863528
  870.8123041   851.71391577  845.12115117  835.65954277  829.0985508
  809.35666241  798.67357231  796.96746724  796.96746724  796.96746724
  796.96746724  796.96746724  796.91966519  795.56640204  793.45964381
  788.88449558  788.1410608   788.1410608   786.57436582  786.57436582
  786.57436582  779.79896649  764.58665754  756.73098069  752.20192121
  749.89566231  749.84715865  747.89773227  737.41811726  736.92211766
  736.74604925  734.96796202  734.94812154  734.94812154  733.60441088
  730.97551427  730.97551427  730.97551427  730.90774124  730.04926925
  729.32367405  729.32367405  728.48698571  728.48698571  727.54601397
  724.03502664  724.03502664  722.44324626  722.44324626  722.44324626
  720.27171755  720.27171755  720.27171755  720.16405486  710.33036813
  706.65676138  706.65676138  706.58916085  705.21211455  704.90822921
  704.90822921  704.90822921  704.90822921  704.90822921  704.90822921
  703.38036603  703.38036603  703.38036603  703.38036603  703.38036603
  703.38036603  692.89766319  684.14644384  684.14644384  684.14644384
  684.14644384  684.14644384  684.14644384  683.42984351  683.42984351
  683.42984351  678.11177058  677.96797717  677.96797717  677.96797717
  673.86178085  673.86178085  673.86178085  663.94795851  663.94795851
  663.94795851  663.94795851  659.2359481   659.2359481   659.2359481
  659.2359481   656.81455887  656.81455887  656.81455887  656.81455887
  656.62116764  656.62116764  656.62116764  656.62116764  655.32145731
  655.27302573  655.27302573  645.09460166  643.73664206  643.73664206
  643.73664206  643.73664206  643.73664206  643.73664206  643.73664206
  643.73664206  643.73664206  643.73664206  643.73664206  643.73664206
  643.73664206  643.73664206  643.73664206  635.48877908  634.84037454
  634.74935301  634.74935301  634.74935301  634.25897258  634.25897258
  634.25897258  633.34047141  630.68716255  621.20942321  620.02018366
  619.24779895  619.11045201  619.11045201  617.29949107  615.7364172
  615.7364172   614.72229214  614.72229214  614.60250702  614.60250702
  614.60250702  614.60250702  609.51630605  609.51630605  608.35035145
  604.56582632  603.60394993  603.55099802  603.54371567  602.74862331
  602.1946345   602.1946345   602.19395427  602.19215194  602.19215194]

Mine with multivariate

[1419.83554381 1288.82736288 1273.8679245  1219.57426787 1219.57426787
 1193.18984928 1193.18984928 1189.13304909 1189.13304909 1189.09561218
 1144.84034195 1094.93857303 1088.8281102  1048.55814344 1040.58108668
 1039.4199549  1023.15649809 1012.33339194  990.01556984  899.85043663
  869.28756174  869.28756174  864.3310281   848.02941772  798.1555065
  796.1860551   796.1860551   778.97505794  762.60360649  762.60360649
  756.27049265  756.27049265  756.27049265  756.27049265  753.73888381
  736.07393547  736.07393547  727.43621865  721.08766723  712.18576744
  712.18576744  700.23399278  700.23399278  689.94052458  689.94052458
  689.94052458  689.94052458  689.94052458  689.94052458  672.83246935
  672.83246935  669.29241515  669.29241515  669.29241515  663.32227024
  663.32227024  663.32227024  663.32227024  663.32227024  661.40657074
  657.50947227  656.49200594  656.49200594  656.49200594  656.49200594
  645.18879657  638.07189186  638.07189186  634.19573946  634.19573946
  629.00490027  628.70685656  628.70685656  625.09014685  625.09014685
  614.30918107  606.65752638  606.65752638  606.65752638  582.33580662
  578.29511916  574.54178919  574.54178919  574.54178919  566.291671
  566.05920382  561.13002654  561.13002654  560.79227384  541.57964135
  536.61216968  536.61216968  536.61216968  529.04203774  529.04203774
  529.04203774  529.04203774  527.86311402  527.86311402  525.97282964
  525.97282964  525.97282964  518.60394441  515.75499321  515.75499321
  512.08730626  512.08730626  507.30370061  507.30370061  507.30370061
  507.30370061  507.30370061  507.30370061  506.32169803  505.4736274
  495.41438908  495.41438908  495.41438908  495.41438908  495.41438908
  495.41438908  495.41438908  495.41438908  495.41438908  493.85497016
  492.77045206  482.29151981  470.81089664  454.35761025  454.35761025
  454.35761025  454.35761025  454.35761025  454.35761025  438.50733501
  438.50733501  438.50733501  438.50733501  438.50733501  438.50733501
  438.50733501  438.50733501  438.50733501  438.50733501  438.50733501
  438.50733501  433.3583346   433.3583346   433.3583346   429.26279374
  429.26279374  429.26279374  429.26279374  429.26279374  429.26279374
  429.26279374  429.26279374  428.76374259  428.76374259  428.76374259
  428.76374259  420.40246154  420.40246154  420.40246154  420.17094567
  420.17094567  420.17094567  420.17094567  420.17094567  420.17094567
  414.95606531  414.37131164  414.37131164  405.52448526  405.52448526
  393.04628508  393.04628508  393.04628508  393.04628508  393.04628508
  393.04628508  393.04628508  393.04628508  393.04628508  393.04628508
  393.04628508  393.04628508  393.04628508  391.59447335  391.59447335
  391.59447335  391.59447335  391.59447335  386.19761943  377.45578589
  371.94186102  371.94186102  371.94186102  371.94186102  371.94186102]

Optuna with univariate

[1529.3730142  1459.18302966 1437.84897201 1391.13116655 1391.13116655
 1357.47119055 1333.37519894 1328.72571948 1320.65215036 1289.53915307
 1289.53915307 1243.71066074 1157.15186989 1118.45932458 1083.65150007
 1074.82557281 1040.24965372 1036.29384029  981.14587247  978.65758761
  978.65758761  924.18372907  920.36943302  881.15942564  856.18219383
  856.18219383  855.5812048   854.59729767  846.44485267  846.44485267
  846.44485267  846.44485267  839.88211171  833.58186284  831.67767265
  831.67767265  831.67767265  831.67767265  831.67767265  831.67767265
  818.0971601   797.30928881  797.30928881  769.21278672  755.08397818
  755.08397818  755.08397818  755.08397818  755.08397818  755.08397818
  755.08397818  754.36634031  754.36634031  754.36634031  753.48128503
  753.40445843  731.71301173  731.71301173  731.57738339  731.57738339
  731.57738339  728.63131471  728.63131471  728.63131471  728.63131471
  721.24883466  721.24883466  721.24883466  721.24883466  721.24883466
  721.24883466  704.54946292  701.87594476  701.87594476  701.87594476
  697.09685577  697.09685577  697.09685577  697.09685577  697.09685577
  697.09685577  678.37602334  678.37602334  678.37602334  678.37602334
  678.37602334  678.37602334  678.37602334  678.37602334  678.37602334
  678.37602334  666.50789261  666.50789261  666.50789261  666.50789261
  666.50789261  666.50789261  666.50789261  665.75719862  665.75719862
  665.75719862  663.87949663  663.87949663  663.87949663  658.31809194
  658.31809194  658.31809194  658.31809194  658.31809194  658.23630179
  658.23630179  658.23630179  658.23630179  658.23630179  656.08986413
  656.08986413  656.08986413  656.08986413  656.08986413  656.08986413
  656.08986413  656.08986413  656.08986413  652.02529122  651.56409571
  651.56409571  651.56409571  651.56409571  651.56409571  651.56409571
  651.56409571  640.55186983  640.55186983  640.55186983  640.55186983
  640.55186983  637.92629468  637.92629468  637.92629468  637.92629468
  637.69868994  637.69868994  637.36414353  636.49774173  636.49774173
  636.49774173  636.49774173  636.49774173  636.35919271  636.35919271
  636.35919271  636.35919271  636.35919271  628.85624524  626.55279126
  620.51159827  620.51159827  620.51159827  620.14379735  615.75662105
  615.75662105  615.75662105  615.75662105  615.75662105  615.75662105
  615.75662105  615.75662105  615.75662105  615.75662105  615.75662105
  615.75662105  615.75662105  615.75662105  615.75662105  615.75662105
  615.75662105  615.75662105  615.75662105  615.75662105  615.75662105
  615.75662105  610.39858601  610.39858601  610.39858601  610.39858601
  610.39858601  610.39858601  610.39858601  610.39858601  610.39858601
  610.39858601  610.39858601  610.39858601  605.90827362  605.90827362
  605.90827362  604.62458028  604.62458028  604.62458028  604.62458028]

Optuna with multivariate

[1597.04967715 1479.07213189 1402.22262763 1316.12687387 1309.3128067
 1275.64767838 1227.67429033 1227.67429033 1224.12378693 1210.88108318
 1161.70620877 1133.21407141 1122.09275138 1109.92324025 1078.01530456
 1045.14860057 1044.19977091 1028.29987226 1010.01340551 1009.2457921
 1008.18370275  999.45448625  999.45448625  999.45448625  995.33812284
  995.33812284  991.70814791  982.31659217  982.31659217  982.31659217
  982.31659217  982.31659217  982.31659217  982.31659217  982.31659217
  972.33089431  972.33089431  972.33089431  972.33089431  972.33089431
  972.33089431  972.33089431  972.33089431  955.66914402  930.05619309
  930.05619309  920.8324935   920.8324935   920.8324935   910.31129116
  904.18229851  901.96909598  901.96909598  901.96909598  901.96909598
  901.96909598  901.96909598  898.67905693  893.13751606  893.13751606
  888.69221668  884.2342212   884.2342212   884.2342212   884.2342212
  884.2342212   884.2342212   884.2342212   884.2342212   884.2342212
  884.2342212   861.73122207  861.73122207  861.73122207  861.73122207
  857.40625906  857.40625906  857.40625906  857.40625906  857.40625906
  857.40625906  857.40625906  857.40625906  857.40625906  857.40625906
  857.40625906  857.40625906  857.40625906  857.40625906  857.35605091
  857.35605091  857.35605091  857.35605091  857.35605091  853.96698913
  853.96698913  853.96698913  853.96698913  853.96698913  851.63504104
  841.51365391  841.51365391  841.51365391  841.51365391  841.51365391
  841.51365391  841.51365391  841.51365391  840.68793438  840.68793438
  840.68793438  840.68793438  840.68793438  840.68793438  840.68793438
  840.68793438  840.68793438  840.68793438  840.68793438  840.68793438
  840.68793438  837.15198618  837.15198618  830.25050852  822.02997681
  822.02997681  822.02997681  822.02997681  822.02997681  821.45742352
  821.45742352  821.45742352  821.45742352  821.45742352  821.45742352
  821.45742352  821.45742352  821.45742352  821.45742352  821.45742352
  819.64304813  819.64304813  819.64304813  819.64304813  819.64304813
  819.64304813  815.31566705  815.31566705  815.31566705  815.31566705
  815.31566705  815.31566705  815.31566705  814.16484603  814.16484603
  814.16484603  814.16484603  814.16484603  814.16484603  814.16484603
  814.16484603  814.16484603  814.16484603  814.16484603  814.16484603
  814.16484603  814.16484603  814.16484603  814.16484603  814.16484603
  814.16484603  814.16484603  814.16484603  814.16484603  814.16484603
  814.16484603  812.41755403  802.74721641  802.74721641  802.74721641
  802.74721641  802.74721641  802.74721641  802.74721641  802.74721641
  792.29838234  792.29838234  792.29838234  792.29838234  779.49155689
  778.59057285  769.39741218  769.39741218  769.39741218  769.39741218
  769.39741218  769.39741218  769.39741218  769.39741218  769.39741218]

Top Results From Across the Web

Summary Tree of Parzen Estimators (TPE) BOHB Hyperband ...

TPE is similar to Random Search (RS) for the first ~30 evaluations, ... HB and BOHB (and MTBO and Fabolas on the SVM)...

"Multivariate" TPE Makes Optuna Even More Powerful

To compare the performance of multivariate TPE against the independent TPE, we took benchmarks of both algorithms. The benchmark problem was ...

BOHB: Robust and Efficient Hyperparameter Optimization at ...

(2012) obtained state-of-the-art performance on CIFAR-10 by optimizing the hyperparameters of convo- lutional neural networks; Bergstra et al. (2014) used TPE.

BOHB Advisor on NNI - Neural Network Intelligence

The BO part of BOHB closely resembles TPE with one major difference: we opted for a single multidimensional KDE compared to the hierarchy...

BOHB: Robust and Efficient Hyperparameter Optimization at ...

The budget might not have been sufficient for TPE to find the same well-performing configuration. Hyperparameter optimization of 8 ...