question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Automate selection of appropriate parameters for BoTorch components in Ax based on experiment and data size

See original GitHub issue

Hello again (I hope I am not causing too much trouble to the team 😃 ),

I am here to report a possible bug. Attaching many trials through the Service API consumes a lot of memory. Here you have an example: (Warning: A couple of machines started thrashing and then froze with the following code).

from ax.service.ax_client import AxClient
from ax.service.utils.instantiation import ObjectiveProperties
import itertools

def evaluate(args):
    return {
        'a': (5_000, 0.0),
    }

ax_client = AxClient(random_seed=64)
ax_client.create_experiment(
    name="ax_err",
    parameters=[
        {'name': 'p1', 'type': 'range', 'bounds': [0, 5000], 'value_type': 'int'},
        {'name': 'p2', 'type': 'range', 'bounds': [0, 6000], 'value_type': 'int'},
        {'name': 'p3', 'type': 'range', 'bounds': [0, 7000], 'value_type': 'int'},
    ],
    objectives={
        'a': ObjectiveProperties(minimize=True, threshold=10_000)
    },
)

def range_float(stop, percent=10/100):
    l = []
    c = 0
    while c < stop:
        l.append(int(c))
        c += percent * stop
    return l

r_p1 = range_float(5000)
r_p2 = range_float(6000)
r_p3 = range_float(7000)

force_trials = []
for p1, p2, p3 in itertools.product(r_p1, r_p2, r_p3):
    config, trial_index = ax_client.attach_trial({
        'p1': p1,
        'p2': p2,
        'p3': p3
    })
    evaluations = evaluate(config)
    ax_client.complete_trial(trial_index=trial_index, raw_data=evaluations)


for _ in range(15):
    (config, trial_index) = ax_client.get_next_trial()
    evaluations = evaluate(config)
    ax_client.complete_trial(trial_index=trial_index, raw_data=evaluations)

I’ve replicated this issue with MOO. Reducing the percentage (e.g to 20/100) does not cause this behavior. If I had to guess, it does not appear to be a memory leak. I think that some internal operation of ax/botorch has very steep memory complexity.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

5reactions
lena-kashtelyancommented, Sep 15, 2021

Hi @josalhor, sorry for delay on this! Let me split up this issue into two:

  1. how to manually limit memory consumption of qNEI for large-data settings (will show this below),
  2. automate selection of appropriate parameters for BoTorch components in Ax based on experiment and data size (this is something we would like to do, but in the long-term, so I’ll be sending this to our wishlist master issue).

To limit memory consumption of qNEI, the easiest way it probably to use our modular BotAx setup (so Models.BOTORCH_MODULAR instead of Models.GPEI that AxClient is using for you under the hood right now). To do so, you’ll need to:

  1. Construct a generation strategy that will use Models.BOTORCH_MODULAR and pass acquisition function options to it (check out using Models.BOTORCH_MODULAR in generation strategies section of the modular BotAx tutorial for instructions);
  2. As part of model_kwargs for the BoTorch generation step, specify “acquisition_options” to something like {"optimizer_options": {"num_restarts": 10, "raw_samples": 256}},
  3. For more details on generation strategies and their options, check out the generation strategy tutorial.
  4. Pass the resulting generation strategy to AxClient via AxClient(generation_strategy=...).

You should end up with something like this:

from ax.modelbridge.generation_strategy import GenerationStep, GenerationStrategy
from ax.modelbridge.registry import Models

gs = GenerationStrategy(
    steps=[
        GenerationStep(  # Initialization step
            # Which model to use for this step
            model=Models.SOBOL,
            # How many generator runs (each of which is then made a trial) 
            # to produce with this step
            num_trials=5,
            # How many trials generated from this step must be `COMPLETED` 
            # before the next one
            min_trials_observed=5, 
        ),
        GenerationStep(  # BayesOpt step
            model=Models.BOTORCH_MODULAR,
            # No limit on how many generator runs will be produced
            num_trials=-1,
            model_kwargs={  # Kwargs to pass to `BoTorchModel.__init__`
                 "acquisition_options": {"optimizer_options": {"num_restarts": 10, "raw_samples": 256}}
            },
        )
    ]
)

ax_client = AxClient(generation_strategy=gs)

Let us know if this doesn’t work for you! With this, I’ll consider part 1 of the issue resolved and will mark it as wishlist for part 2.

1reaction
josalhorcommented, Aug 25, 2021

what do you mean by a lot of memory? Can you be a bit more specific?

This is peak memory consumption adjusting the percent variable.

Percent Attached Trials Peak Mem
20 / 100 123 1,5G
15 / 100 341 7G
12 / 100 727 > 21G (Killed manually at that point)

This memory consumption comes in two phases. Here is an screenshot for the 15 / 100 entry:

Image 15/20

Here it is for the 12 / 20 one (manually killed):

Image 12/20

I’ve seen runs where the valley between the two phases is way less pronounced. It may be an issue that comes from both high memory consumption and Garbage Collection weirdness.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Setup and Usage of BoTorch Models in Ax
If instantiated without one or both components specified, defaults are selected based on properties of experiment and data (see Appendix 2 for ...
Read more >
Using BoTorch with Ax
Ax is a platform for sequential experimentation. It relies on BoTorch for implementing Bayesian Optimization algorithms, but provides higher-level APIs that ...
Read more >
Wishlist: Tracking Issue #566 - facebook/Ax - GitHub
... Automate selection of appropriate parameters for BoTorch components in Ax based on experiment and data size ( Automate selection of ...
Read more >
Open-sourcing new AI tools for adaptive experimentation
We are open-sourcing BoTorch and Ax, two new tools that leverage adaptive experimentation to ... managing, deploying, and automating adaptive experiments.
Read more >
Facebook's Open Source Frameworks to Streamline PyTorch ...
Ax and BoTorch are two PyTorch experimentation frameworks created by Facebook ... managing, deploying, and automating adaptive experiments.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found