TPE not working with a categorical variable with different choice types
See original GitHub issueHi. Thanks for this amazing library!
When I try to optimize a model with a categorical variable with different choice types, it gives me IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (9,) (4,)
. This error always happens after 10 trials.
I would be very grateful for an explanation on why is that the case - I guess it has to do with how TPE makes categorical variables continuous, but I wasn’t able to find how it’s done.
Also, I think that it might be very useful for people relatively new to Optuna if the error was more self-explanatory. In my case, it took over 10h to find out why my optimization failed.
Environment
- Optuna version: 2.10.0
- Python version: 3.6.10
- OS: Linux-4.18.0-305.10.2.el8_4.ppc64le-ppc64le-with-redhat-8.4-Ootpa
Error messages, stack traces, or logs
/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/sampler.py:266: ExperimentalWarning: ``multivariate`` option is an experimental feature. The interface can change in the future.
ExperimentalWarning,
/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/sampler.py:285: ExperimentalWarning: ``constant_liar`` option is an experimental feature. The interface can change in the future.
ExperimentalWarning,
[I 2021-11-28 07:31:46,317] Trial 0 finished with value: 51.25318418441549 and parameters: {'x1': 7.159132921270249, 'x2': None}. Best is trial 0 with value: 51.25318418441549.
[I 2021-11-28 07:31:46,661] Trial 1 finished with value: 0.0 and parameters: {'x1': -6.523347451025641, 'x2': 1}. Best is trial 0 with value: 51.25318418441549.
[I 2021-11-28 07:31:46,857] Trial 2 finished with value: 77.13131244750303 and parameters: {'x1': 8.782443421252598, 'x2': None}. Best is trial 2 with value: 77.13131244750303.
[I 2021-11-28 07:31:47,067] Trial 3 finished with value: 71.35511499232025 and parameters: {'x1': -8.447195688056496, 'x2': None}. Best is trial 2 with value: 77.13131244750303.
[I 2021-11-28 07:31:47,283] Trial 4 finished with value: 0.0 and parameters: {'x1': 7.448216057409198, 'x2': 1}. Best is trial 2 with value: 77.13131244750303.
[I 2021-11-28 07:31:47,537] Trial 5 finished with value: 42.52100156846517 and parameters: {'x1': 6.520812953034703, 'x2': None}. Best is trial 2 with value: 77.13131244750303.
[I 2021-11-28 07:31:47,757] Trial 6 finished with value: 0.0 and parameters: {'x1': 3.2600618681215856, 'x2': 1}. Best is trial 2 with value: 77.13131244750303.
[I 2021-11-28 07:31:47,905] Trial 7 finished with value: 88.15753251520141 and parameters: {'x1': 9.389224276541775, 'x2': None}. Best is trial 7 with value: 88.15753251520141.
[I 2021-11-28 07:31:48,029] Trial 8 finished with value: 39.329878182295595 and parameters: {'x1': 6.271353775884087, 'x2': None}. Best is trial 7 with value: 88.15753251520141.
[I 2021-11-28 07:31:48,144] Trial 9 finished with value: 0.0 and parameters: {'x1': -7.077110913961812, 'x2': 1}. Best is trial 7 with value: 88.15753251520141.
Traceback (most recent call last):
File "optuna_test.py", line 23, in <module>
study.optimize(train, n_trials=300)
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/study/study.py", line 409, in optimize
show_progress_bar=show_progress_bar,
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/study/_optimize.py", line 76, in _optimize
progress_bar=progress_bar,
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
trial = _run_trial(study, func, catch)
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/study/_optimize.py", line 194, in _run_trial
trial = study.ask()
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/study/study.py", line 486, in ask
trial = trial_module.Trial(self, trial_id)
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/trial/_trial.py", line 56, in __init__
self._init_relative_params()
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/trial/_trial.py", line 66, in _init_relative_params
study, trial, self.relative_search_space
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/sampler.py", line 349, in sample_relative
return self._sample_relative(study, trial, search_space)
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/sampler.py", line 386, in _sample_relative
mpe_above = _ParzenEstimator(above, search_space, self._parzen_estimator_parameters)
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/parzen_estimator.py", line 87, in __init__
param_observations, param_name
File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/parzen_estimator.py", line 366, in _calculate_categorical_params
weights[np.arange(n_observations), observations] += 1
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (9,) (4,)
Steps to reproduce
- Create an optuna study with
optuna create-study --study-name '1234' --storage 'sqlite:///test.db' --direction 'maximize'
- Run
optuna_test.py
defined below.
Reproducible examples (optional)
optuna_test.py
:
import optuna
def train(trial):
x1 = trial.suggest_uniform('x1', -10, 10)
x2 = trial.suggest_categorical('x2', [None, 1])
if x2:
score = 0
else:
score = x1 ** 2
return score
if __name__ == "__main__":
sampler = optuna.samplers.TPESampler(multivariate=True,
group=False,
constant_liar=True)
study = optuna.load_study(study_name='1234',
sampler=sampler,
storage="sqlite:///test.db")
study.optimize(train, n_trials=300)
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Coding Systems for Categorical Variables in Regression ...
We will discuss two general types of coding and when to use them: dummy ... uses a variable with four levels, these coding...
Read more >Using The Pandas Category Data Type
While categorical data is very handy in pandas. It is not necessary for every type of analysis. In fact, there can be some...
Read more >Categorical data — pandas 1.5.2 documentation
Categoricals are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited, and usually fixed, ...
Read more >4.2 Types of variables - Statistique Canada
It is important to note that even if categorical variables are not quantifiable, they can appear as numbers in a data set. Correspondence ......
Read more >1.2 - Summarizing Categorical Data - STAT ONLINE
Once the type of data, categorical or quantitative is identified, we can consider ... Below are a frequency table, a pie chart, and...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I figured out these lines are buggy.
raw_param_value is not None
is not equivalent toparam_name in trial.params
. https://github.com/optuna/optuna/blob/828593078770607555a10eccb06d70d68afa05b8/optuna/samplers/_tpe/sampler.py#L632-L640A test for #3190 is added by #3447.