TPE performs poorly compared to the TPE in BOHB
See original GitHub issueEnvironment and experiments
- python3.8
- Ubuntu18.04
- optuna2.8.0
First, I will show the comparisons of TPEs in my repo which is based on BOHB implementation and the TPE in Optuna v2.8.0.
Method | 10D Rosenbrock ( |
10D Sphere ( |
---|---|---|
Optuna TPE | ||
BOHB TPE | ||
Random Search |
Each experiment is performed 10 times using different random seeds and each run uses 100 evaluations including 10 random initial evaluations.
NOTE
I roughly checked the performance on ackley
as well and BOHB outperformed.
I think it is worth trying Griewank
, Michalewicz
, Rastrigin
, Schwefel
, Xin-she yang
, Styblinski-Tang
as well. All the functions are already available here and the searching domain is defined here.
Experiment code
For Optuna
import optuna
from optuna.samplers import TPESampler
V, dim = 5, 10
def sphere(**kwargs):
val = 0
for x in kwargs.values():
val += x ** 2
return val
def rosen(**kwargs):
val = 0
xs = list(kwargs.values())
for d in range(dim - 1):
t1 = 100 * (xs[d + 1] - xs[d] ** 2) ** 2
t2 = (xs[d] - 1) ** 2
val += t1 + t2
return val
def func(trial):
val = 0
xs = {f'x{d}': trial.suggest_uniform(f'x{d}', -V, V) for d in range(dim)}
return rosen(**xs) # or sphere(**xs)
study = optuna.create_study(sampler=TPESampler(multivariate=True))
study.optimize(func, n_trials=100)
For BOHB,
# Repeat them 10 times
$ python mvtpe_main.py -fuc sphere -dim 10 -eva 100 -ini 10
$ python mvtpe_main.py -fuc rosenbrock -dim 10 -eva 100 -ini 10
Why?
Intrinsically, TPE is a local search method and the performance is highly sensitive to the selection of bandwidth. If I understand correctly, the Optuna implementation fixes the bandwidth factor sigma0
over all the dimensions. However, I am not sure if this is a good strategy here because of the following two reasons:
- Each dimension has different densities of observations (some dimensions might be packed densely while others not)
- Low intrinsic dimensionality (It is likely to yield less packed density in unimportant dimensions)
Based on my observations, the bandwidth in the Optuna TPE is a bit large and thus the searching is quite close to random search. Although the KDE ratio will let you know which set is the best among the sampled sets, the choices of sets are combinatorial and thus usually it is hard to cover good sets with a small number of samples.
Note that since shorter bandwidth leads to more exploitative searching, it is often effective to introduce a regularizer such as mutation as in genetic algorithm.
The lines of doubt
In this report, I only focused on multivariate TPE.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:8
- Comments:13 (3 by maintainers)
Hi @not522 ,
I checked those papers before:
But please keep in mind that those methods assume that observations are sampled from i.i.d., which is not the case for TPE, and that is why bandwidth is typically underestimated because of the local search nature of TPE.
Also, you might have noticed in the v3.0 release, but the multivariate TPE doesn’t work well in high dimensions and benchmark functions. There are several reasons for this:
Other things I noticed from prior experiences are that:
I might add some more comments later or maybe not. But please feel free to ask me anything unclear and then I will surely answer those questions.
@not522 I checked my implementation and the Optuna implementation on some high-dimensional (50d) benchmark functions with
n_trials = 200
.I describe findings here:
NOTE Unlike the Optuna implementation, my implementation uses the same bandwidth selection for both univariate and multivariate and thus we just know that multivariate could be better unless the objective function is highly multimodal.
My implementation has several differences from the Optuna implementation, but I found out that separate bandwidth selection for each dimension made difference in the performance. This point was already mentioned in the very first post, but as I was expecting that mine also gets worse performance in higher dimensions, I report the results here.
For example, the results on 50d Griewank (the mean of the cumulated minimum on each seed)
Mine with univariate
Mine with multivariate
Optuna with univariate
Optuna with multivariate