Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TPE not working with a categorical variable with different choice types

See original GitHub issue

Hi. Thanks for this amazing library!

When I try to optimize a model with a categorical variable with different choice types, it gives me IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (9,) (4,). This error always happens after 10 trials.

I would be very grateful for an explanation on why is that the case - I guess it has to do with how TPE makes categorical variables continuous, but I wasn’t able to find how it’s done.

Also, I think that it might be very useful for people relatively new to Optuna if the error was more self-explanatory. In my case, it took over 10h to find out why my optimization failed.

Environment

Optuna version: 2.10.0
Python version: 3.6.10
OS: Linux-4.18.0-305.10.2.el8_4.ppc64le-ppc64le-with-redhat-8.4-Ootpa

Error messages, stack traces, or logs

/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/sampler.py:266: ExperimentalWarning: ``multivariate`` option is an experimental feature. The interface can change in the future.
  ExperimentalWarning,
/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/sampler.py:285: ExperimentalWarning: ``constant_liar`` option is an experimental feature. The interface can change in the future.
  ExperimentalWarning,
[I 2021-11-28 07:31:46,317] Trial 0 finished with value: 51.25318418441549 and parameters: {'x1': 7.159132921270249, 'x2': None}. Best is trial 0 with value: 51.25318418441549.
[I 2021-11-28 07:31:46,661] Trial 1 finished with value: 0.0 and parameters: {'x1': -6.523347451025641, 'x2': 1}. Best is trial 0 with value: 51.25318418441549.
[I 2021-11-28 07:31:46,857] Trial 2 finished with value: 77.13131244750303 and parameters: {'x1': 8.782443421252598, 'x2': None}. Best is trial 2 with value: 77.13131244750303.
[I 2021-11-28 07:31:47,067] Trial 3 finished with value: 71.35511499232025 and parameters: {'x1': -8.447195688056496, 'x2': None}. Best is trial 2 with value: 77.13131244750303.
[I 2021-11-28 07:31:47,283] Trial 4 finished with value: 0.0 and parameters: {'x1': 7.448216057409198, 'x2': 1}. Best is trial 2 with value: 77.13131244750303.
[I 2021-11-28 07:31:47,537] Trial 5 finished with value: 42.52100156846517 and parameters: {'x1': 6.520812953034703, 'x2': None}. Best is trial 2 with value: 77.13131244750303.
[I 2021-11-28 07:31:47,757] Trial 6 finished with value: 0.0 and parameters: {'x1': 3.2600618681215856, 'x2': 1}. Best is trial 2 with value: 77.13131244750303.
[I 2021-11-28 07:31:47,905] Trial 7 finished with value: 88.15753251520141 and parameters: {'x1': 9.389224276541775, 'x2': None}. Best is trial 7 with value: 88.15753251520141.
[I 2021-11-28 07:31:48,029] Trial 8 finished with value: 39.329878182295595 and parameters: {'x1': 6.271353775884087, 'x2': None}. Best is trial 7 with value: 88.15753251520141.
[I 2021-11-28 07:31:48,144] Trial 9 finished with value: 0.0 and parameters: {'x1': -7.077110913961812, 'x2': 1}. Best is trial 7 with value: 88.15753251520141.
Traceback (most recent call last):
  File "optuna_test.py", line 23, in <module>
    study.optimize(train, n_trials=300)
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/study/study.py", line 409, in optimize
    show_progress_bar=show_progress_bar,
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/study/_optimize.py", line 76, in _optimize
    progress_bar=progress_bar,
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
    trial = _run_trial(study, func, catch)
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/study/_optimize.py", line 194, in _run_trial
    trial = study.ask()
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/study/study.py", line 486, in ask
    trial = trial_module.Trial(self, trial_id)
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/trial/_trial.py", line 56, in __init__
    self._init_relative_params()
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/trial/_trial.py", line 66, in _init_relative_params
    study, trial, self.relative_search_space
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/sampler.py", line 349, in sample_relative
    return self._sample_relative(study, trial, search_space)
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/sampler.py", line 386, in _sample_relative
    mpe_above = _ParzenEstimator(above, search_space, self._parzen_estimator_parameters)
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/parzen_estimator.py", line 87, in __init__
    param_observations, param_name
  File "/home/chledj01/.local/lib/python3.6/site-packages/optuna/samplers/_tpe/parzen_estimator.py", line 366, in _calculate_categorical_params
    weights[np.arange(n_observations), observations] += 1
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (9,) (4,)

Steps to reproduce

Create an optuna study with optuna create-study --study-name '1234' --storage 'sqlite:///test.db' --direction 'maximize'
Run optuna_test.py defined below.

Reproducible examples (optional)

optuna_test.py:

import optuna


def train(trial):
    x1 = trial.suggest_uniform('x1', -10, 10)
    x2 = trial.suggest_categorical('x2', [None, 1])
    if x2:
        score = 0
    else:
        score = x1 ** 2
    return score


if __name__ == "__main__":

    sampler = optuna.samplers.TPESampler(multivariate=True,
                                         group=False,
                                         constant_liar=True)
    study = optuna.load_study(study_name='1234',
                              sampler=sampler,
                              storage="sqlite:///test.db")

    study.optimize(train, n_trials=300)

Issue Analytics

State:
Created 2 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

6reactions

not522commented, Nov 29, 2021

I figured out these lines are buggy. raw_param_value is not None is not equivalent to param_name in trial.params. https://github.com/optuna/optuna/blob/828593078770607555a10eccb06d70d68afa05b8/optuna/samplers/_tpe/sampler.py#L632-L640

2reactions

contramundum53commented, Apr 22, 2022

A test for #3190 is added by #3447.

Top Results From Across the Web

Coding Systems for Categorical Variables in Regression ...

We will discuss two general types of coding and when to use them: dummy ... uses a variable with four levels, these coding...

Using The Pandas Category Data Type

While categorical data is very handy in pandas. It is not necessary for every type of analysis. In fact, there can be some...

Categorical data — pandas 1.5.2 documentation

Categoricals are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited, and usually fixed, ...

4.2 Types of variables - Statistique Canada

It is important to note that even if categorical variables are not quantifiable, they can appear as numbers in a data set. Correspondence ......

1.2 - Summarizing Categorical Data - STAT ONLINE

Once the type of data, categorical or quantitative is identified, we can consider ... Below are a frequency table, a pie chart, and...