Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Automate selection of appropriate parameters for BoTorch components in Ax based on experiment and data size

See original GitHub issue

Hello again (I hope I am not causing too much trouble to the team 😃 ),

I am here to report a possible bug. Attaching many trials through the Service API consumes a lot of memory. Here you have an example: (Warning: A couple of machines started thrashing and then froze with the following code).

from ax.service.ax_client import AxClient
from ax.service.utils.instantiation import ObjectiveProperties
import itertools

def evaluate(args):
    return {
        'a': (5_000, 0.0),
    }

ax_client = AxClient(random_seed=64)
ax_client.create_experiment(
    name="ax_err",
    parameters=[
        {'name': 'p1', 'type': 'range', 'bounds': [0, 5000], 'value_type': 'int'},
        {'name': 'p2', 'type': 'range', 'bounds': [0, 6000], 'value_type': 'int'},
        {'name': 'p3', 'type': 'range', 'bounds': [0, 7000], 'value_type': 'int'},
    ],
    objectives={
        'a': ObjectiveProperties(minimize=True, threshold=10_000)
    },
)

def range_float(stop, percent=10/100):
    l = []
    c = 0
    while c < stop:
        l.append(int(c))
        c += percent * stop
    return l

r_p1 = range_float(5000)
r_p2 = range_float(6000)
r_p3 = range_float(7000)

force_trials = []
for p1, p2, p3 in itertools.product(r_p1, r_p2, r_p3):
    config, trial_index = ax_client.attach_trial({
        'p1': p1,
        'p2': p2,
        'p3': p3
    })
    evaluations = evaluate(config)
    ax_client.complete_trial(trial_index=trial_index, raw_data=evaluations)


for _ in range(15):
    (config, trial_index) = ax_client.get_next_trial()
    evaluations = evaluate(config)
    ax_client.complete_trial(trial_index=trial_index, raw_data=evaluations)

I’ve replicated this issue with MOO. Reducing the percentage (e.g to 20/100) does not cause this behavior. If I had to guess, it does not appear to be a memory leak. I think that some internal operation of ax/botorch has very steep memory complexity.

Issue Analytics

State:
Created 2 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

5reactions

lena-kashtelyancommented, Sep 15, 2021

Hi @josalhor, sorry for delay on this! Let me split up this issue into two:

how to manually limit memory consumption of qNEI for large-data settings (will show this below),
automate selection of appropriate parameters for BoTorch components in Ax based on experiment and data size (this is something we would like to do, but in the long-term, so I’ll be sending this to our wishlist master issue).

To limit memory consumption of qNEI, the easiest way it probably to use our modular BotAx setup (so Models.BOTORCH_MODULAR instead of Models.GPEI that AxClient is using for you under the hood right now). To do so, you’ll need to:

Construct a generation strategy that will use Models.BOTORCH_MODULAR and pass acquisition function options to it (check out using Models.BOTORCH_MODULAR in generation strategies section of the modular BotAx tutorial for instructions);
As part of model_kwargs for the BoTorch generation step, specify “acquisition_options” to something like {"optimizer_options": {"num_restarts": 10, "raw_samples": 256}},
For more details on generation strategies and their options, check out the generation strategy tutorial.
Pass the resulting generation strategy to AxClient via AxClient(generation_strategy=...).

You should end up with something like this:

from ax.modelbridge.generation_strategy import GenerationStep, GenerationStrategy
from ax.modelbridge.registry import Models

gs = GenerationStrategy(
    steps=[
        GenerationStep(  # Initialization step
            # Which model to use for this step
            model=Models.SOBOL,
            # How many generator runs (each of which is then made a trial) 
            # to produce with this step
            num_trials=5,
            # How many trials generated from this step must be `COMPLETED` 
            # before the next one
            min_trials_observed=5, 
        ),
        GenerationStep(  # BayesOpt step
            model=Models.BOTORCH_MODULAR,
            # No limit on how many generator runs will be produced
            num_trials=-1,
            model_kwargs={  # Kwargs to pass to `BoTorchModel.__init__`
                 "acquisition_options": {"optimizer_options": {"num_restarts": 10, "raw_samples": 256}}
            },
        )
    ]
)

ax_client = AxClient(generation_strategy=gs)

Let us know if this doesn’t work for you! With this, I’ll consider part 1 of the issue resolved and will mark it as wishlist for part 2.

1reaction

josalhorcommented, Aug 25, 2021

what do you mean by a lot of memory? Can you be a bit more specific?

This is peak memory consumption adjusting the percent variable.

Percent	Attached Trials	Peak Mem
20 / 100	123	1,5G
15 / 100	341	7G
12 / 100	727	> 21G (Killed manually at that point)

This memory consumption comes in two phases. Here is an screenshot for the 15 / 100 entry:

Image 15/20

Here it is for the 12 / 20 one (manually killed):

Image 12/20

I’ve seen runs where the valley between the two phases is way less pronounced. It may be an issue that comes from both high memory consumption and Garbage Collection weirdness.

Top Results From Across the Web

Setup and Usage of BoTorch Models in Ax

If instantiated without one or both components specified, defaults are selected based on properties of experiment and data (see Appendix 2 for ...

Using BoTorch with Ax

Ax is a platform for sequential experimentation. It relies on BoTorch for implementing Bayesian Optimization algorithms, but provides higher-level APIs that ...

Wishlist: Tracking Issue #566 - facebook/Ax - GitHub

... Automate selection of appropriate parameters for BoTorch components in Ax based on experiment and data size ( Automate selection of ...

Open-sourcing new AI tools for adaptive experimentation

We are open-sourcing BoTorch and Ax, two new tools that leverage adaptive experimentation to ... managing, deploying, and automating adaptive experiments.

Facebook's Open Source Frameworks to Streamline PyTorch ...

Ax and BoTorch are two PyTorch experimentation frameworks created by Facebook ... managing, deploying, and automating adaptive experiments.