Initializing experiment with data points outside the experiment search space (a.k.a. out-of-design points)
See original GitHub issueI’m following the tutorial for implementing the multi-objective optimization. With 15 input variables and 10 output variables, the search space is vast. However, we have some known data points that we know are relatively close to optimal (e.g. 9/10 output variables are satisfied).
I would like to initialize the experiment with some known data points before implementing the pseudo-random sobol. I’ve been searching through the API and I can see there is some functionality to attach trials, but I can’t seem to get this to work in the current tutorial. I’m looking to do something like what I have below.
Can you provide some guidance on how to attach any number of initial trials for multi-objective optimization? Ideally, I would like to be able to programmatically add these data points before any experiment begins. Thanks!
def build_experiment():
experiment = Experiment(
name="pareto_experiment",
search_space=search_space,
optimization_config=optimization_config,
runner=SyntheticRunner(),
)
return experiment
## Initialize with Sobol samples
def initialize_experiment(experiment):
#### HOW DO I ATTACH A TRIAL EXPERIMENT WITH INITIAL DATA POINTS??
start_data = { "a":4.5e-07, "b":2.3e-10, "c":2e-10, "d":2e-10, "e":1e-12, "f":0.7, "g":15.0, "h":15.0, "i":3.0e-11, "j":1e-12, "k":7.4, "l":10, "m":10, "n":20, "o":46}
experiment._attach_trial(start_data)
sobol = Models.SOBOL(search_space=experiment.search_space, seed=1234)
for _ in range(N_INIT):
experiment.new_trial(sobol.gen(1)).run()
return experiment.fetch_data()
ehvi_experiment = build_experiment()
ehvi_data = initialize_experiment(ehvi_experiment)
Issue Analytics
- State:
- Created 2 years ago
- Comments:46 (23 by maintainers)
Top GitHub Comments
@dczaretsky are you using any Ax functions to validate that the observed points are feasible? Are you able to share the data so we can take a closer look?
One thing to note is that in this function to infer the objective thresholds the feasibility is computed on the points predicted by the model. So it could be that if the model does a good amount of smoothing / regularization that the observed measurements at the training points are feasible, while the model predicted ones are not - this usually shouldn’t happen if the model fit is good, but it could explain this error here.
One thing we should consider doing is not filtering those observations out by default. There are situations in which it might be very reasonable to have observations outside the search space - maybe there is a bunch of data that has been collected, but the experimenter has the domain knowledge to restrict the search space to something smaller than the full support of the data. Including observations beyond the boundaries of the search space should help improve the model, even if these observations are deemed infeasible and would thus not be suggested by the generation strategy.
With the developer API we have the ability to do this - essentially have a large search space for the modeling, and then pass a new search space to
gen
. However, I am not sure if this is possible for the service API (or whether it should be…). Curious what @lena-kashtelyan’s thoughts are on this.