Do the numeric values of ordinal ChoiceParameter choices affect the model?
See original GitHub issueTL;DR
The docs seem to suggest that ordinal parameters use a Matern 5/2 kernel, so I assume the answer is “yes” it does affect it. Is there a way to change this to Hamming distance so that order constraints can be used with categorical variables? See toy problem below
Toy Problem
Take some data
based on choices
which are used to construct choice parameters (slots
) and constraints
. The choices
can go into any of the slots
, and choices
are constrained to populate slots
in a particular order (e.g. BAC
is not allowed, only ABC
). The script-version is given in ordinal_example.py.
Imports
import numpy as np
import pandas as pd
Choices and Data
data = [["A", "B", "C"], ["D", "C", "A"], ["C", "A", "B"], ["C", "B", "A"]]
choices = list(np.unique(np.array(data).flatten()))
n_choices = len(choices)
Ordinal Encoding
df = pd.DataFrame(data)
choice_lookup = {
choice: choice_num for (choice, choice_num) in zip(choices, range(n_choices))
}
encoded_df = df.replace(choice_lookup)
encoded_choices = pd.DataFrame(choices)[0].map(choice_lookup).values
encoded_data = encoded_df.values
print(encoded_data)
[[0 1 2] [3 2 0] [2 0 1] [2 1 0]]
Choice Parameters
nslots = 3
slot_names = ["slot_" + str(i) for i in range(nslots)]
slots = [
{
"name": slot_name,
"type": "choice",
"values": encoded_choices,
}
for slot_name in slot_names
]
print(slots) # then format via black
[ {"name": "slot_0", "type": "choice", "values": ["A", "B", "C", "D"]}, {"name": "slot_1", "type": "choice", "values": ["A", "B", "C", "D"]}, {"name": "slot_2", "type": "choice", "values": ["A", "B", "C", "D"]}, ]
Ordered Constraints
constraints = [
lhs + " <= " + rhs for (lhs, rhs) in list(zip(slot_names[:-1], slot_names[1:]))
]
print(constraints)
["slot_0 >= slot_1", "slot_1 >= slot_2"]
Docs suggest ordinal parameters use Matern 5/2 kernel
Based on Support for mixed search spaces and categorical variables (docs):
The most common way of dealing with categorical variables in Bayesian optimization is to one-hot encode the categories to allow fitting a GP model in a continuous space. In this setting, a categorical variable with categories [“red”, “blue”, “green”] is represented by three new variables (one for each category). While this is a convenient choice, it can drastically increase the dimensionality of the search space. In addition, the acquisition function is often optimized in the corresponding continuous space and the final candidate is selected by rounding back to the original space, which may result in selecting sub-optimal points according to the acquisition function.
Our new approach uses separate kernels for the categorical and ordinal (continuous/integer) variables. In particular, we use a kernel of the form:
k(x,y)=kcat(xcat,ycat)×kord(xord,yord)+kcat(xcat,ycat)+kord(xord,yord)
For the ordinal variables we can use a standard kernel such as Matérn-5/2, but for the categorical variables we need a way to compute distances between the different categories. A natural choice is to set the distance is 0 if two categories are equal and 1 otherwise, similar to the idea of Hamming distances. This approach can be combined with the idea automatic relevance determination (ARD) where each categorical variable has its own lengthscale. Rather than optimizing the acquisition function in a continuously relaxed space, we optimize it separately over each combination of the categorical variables. While this is likely to result in better optimization performance, it may lead to slow optimization of the acquisition function when there are many categorical variables.
It seems like ordinal variables will use a Matérn-5/2 kernel by default, in which case I’d assume the numeric choices of the ordinal parameters to play a significant role. Is this the case? How do I replace this with a Hamming distance instead? Is this a flag that could be incorporated into e.g. ax_client.create_experiment()
or the other APIs?
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
So this is for ordered categorical variables. For unordered ones we do indeed use the hamming distance.
We pass down some minimal representation (a
SearchSpaceDigest
) into the modelbridge layer based on which we choose what kind of model/kernel to use. E.g. if there are unordered categoricals we will end up in this branch that chooses a model that uses both a Matern and a Hamming distance kernel:https://github.com/facebook/Ax/blob/65dc4945d2988bc67b47320cb4d769c09f150811/ax/models/torch/botorch_modular/utils.py#L105-L115
Currently this happens automatically based on the parameter type, i.e., there isn’t an easy way right now to use a hamming distance kernel for an integer parameter.
Since these are rather complex constraints, I wanted to resurface my previous comment:
It may make the most sense to do something custom here, e.g. using a heuristic mixed-discrete optimization strategy operating directly on the set of feasible orderings in the slot. Something like this could potentially be achieved by passing in some callable that just serves as an feasibility check for whether a given slot configuration is feasible, or it may just itself generate the set of feasible orderings.
cc @qingfeng10 (as modopt oncall) and @Balandat