question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError when it is run on gpu while parameter constraint is present

See original GitHub issue

Hi Ax team,

Thanks for the quick fix the issue #676.

I am getting another error during GPEI generation step when I define parameter constraint to my problem and the device type is introduced as “cuda:0”. I added both my simple code and full backtrace below. Ax version is ‘0.2.1’.

from ax.modelbridge.generation_strategy import GenerationStrategy, GenerationStep
from ax.service.ax_client import AxClient
from ax.modelbridge.registry import Models
import torch

def objective_function(x):
    y = x["x1"]**2 + x["x2"]**2
    return {"y":(y, None)}

parameters=[{"name": "x1", "type": "range", "bounds": [0.0, 1.0]},{"name": "x2", "type": "range", "bounds": [0.0, 1.0]}]

gs = GenerationStrategy(
    steps=[
        GenerationStep(
            model=Models.SOBOL,
            num_trials=5,  
        ),
        GenerationStep(
            model=Models.GPEI,
            num_trials=-1,  
            model_kwargs = {"torch_dtype":torch.double, "torch_device":torch.device("cuda:0")},
        ),
    ]
)

exp = AxClient(generation_strategy = gs, verbose_logging = False)
exp.create_experiment(
                name = "dummy_exp",
                parameters = parameters,
                objective_name = "y",
                minimize = True,
                parameter_constraints = [" x1 + x2 <= 1.0"],
                )

for i in range(20):
    parameters, trial_index = exp.get_next_trial()
    value = objective_function(parameters)
    exp.complete_trial(trial_index = trial_index, raw_data = value)

  File "C:\Anaconda3\lib\site-packages\ax\utils\common\executils.py", line 169, in handle_exceptions_in_retries
    yield  # Perform action within the context manager.

  File "C:\Anaconda3\lib\site-packages\ax\utils\common\executils.py", line 147, in actual_wrapper
    return func(*args, **kwargs)

  File "C:\Anaconda3\lib\site-packages\ax\service\ax_client.py", line 327, in get_next_trial
    generator_run=self._gen_new_generator_run(), ttl_seconds=ttl_seconds

  File "C:\Anaconda3\lib\site-packages\ax\service\ax_client.py", line 1095, in _gen_new_generator_run
    return not_none(self.generation_strategy).gen(

  File "C:\Anaconda3\lib\site-packages\ax\modelbridge\generation_strategy.py", line 405, in gen
    return self._gen_multiple(

  File "C:\Anaconda3\lib\site-packages\ax\modelbridge\generation_strategy.py", line 509, in _gen_multiple
    generator_run = model.gen(

  File "C:\Anaconda3\lib\site-packages\ax\modelbridge\base.py", line 669, in gen
    observation_features, weights, best_obsf, gen_metadata = self._gen(

  File "C:\Anaconda3\lib\site-packages\ax\modelbridge\array.py", line 274, in _gen
    X, w, gen_metadata, candidate_metadata = self._model_gen(

  File "C:\Anaconda3\lib\site-packages\ax\modelbridge\torch.py", line 207, in _model_gen
    X, w, gen_metadata, candidate_metadata = self.model.gen(

  File "C:\Anaconda3\lib\site-packages\ax\models\torch\botorch.py", line 396, in gen
    candidates, expected_acquisition_value = make_and_optimize_acqf()

  File "C:\Anaconda3\lib\site-packages\ax\models\torch\botorch.py", line 382, in make_and_optimize_acqf
    candidates, expected_acquisition_value = self.acqf_optimizer(

  File "C:\Anaconda3\lib\site-packages\ax\models\torch\botorch_defaults.py", line 322, in scipy_optimizer
    X, expected_acquisition_value = optimize_acqf(

  File "C:\Anaconda3\lib\site-packages\botorch\optim\optimize.py", line 166, in optimize_acqf
    batch_initial_conditions = ic_gen(

  File "C:\Anaconda3\lib\site-packages\botorch\optim\initializers.py", line 129, in gen_batch_initial_conditions
    get_polytope_samples(

  File "C:\Anaconda3\lib\site-packages\botorch\utils\sampling.py", line 845, in get_polytope_samples
    polytope_sampler = HitAndRunPolytopeSampler(

  File "C:\Anaconda3\lib\site-packages\botorch\utils\sampling.py", line 639, in __init__
    super().__init__(

  File "C:\Anaconda3\lib\site-packages\botorch\utils\sampling.py", line 527, in __init__
    A = torch.cat([A, A2], dim=0)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument tensors in method wrapper__cat)

I am not quite sure whether the simple code is written with the correct use of GenerationStrategy, but it is working when device is selected as “cpu” or when run on device “cuda:0” without specifying parameter constraints.

Thanks for your supports.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
Balandatcommented, Aug 26, 2021

Great - we’re hoping to put out new BoTorch and Ax releases soon that include this fix.

1reaction
lena-kashtelyancommented, Sep 15, 2021

Stable version 0.2.2 is out now and takes care of this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to set specific gpu in tensorflow? - Stack Overflow
Using with tf.device('/gpu:2') and creating the graph. Then it will use GPU device 2 to run. Using config = tf.ConfigProto(device_count = {'GPU': 1})...
Read more >
6.33. Data types used by CUDA Runtime
This error indicates that a grid launch did not occur because the kernel uses file-scoped textures which are unsupported by the device runtime....
Read more >
Multi-GPU Training · Issue #475 · ultralytics/yolov5 - GitHub
If you get RuntimeError: Address already in use , it could be because you are running multiple trainings at a time.
Read more >
CUDA semantics — PyTorch 1.13 documentation
cuda is used to set up and run CUDA operations. It keeps track of the currently selected GPU, and all CUDA tensors you...
Read more >
API — ONNX Runtime 1.14.92+cpu documentation
A graph is executed on a device other than CPU, for instance CUDA. Users can use IOBinding to copy the data onto the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found