question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multi-GPU sweep fails to create a second agent with 400 error

See original GitHub issue

Describe the bug Multi-GPU sweep fails to create a second agent with error attached. The first agent starts as expected. The sweep ID is qnjyxjtu.

To Reproduce Steps to reproduce the behavior: CUDA_VISIBLE_DEVICES=0 wandb agent kaiwenw/CartPole-v0/qnjyxjtu CUDA_VISIBLE_DEVICES=1 wandb agent kaiwenw/CartPole-v0/qnjyxjtu

Expected behavior Expect two agents to start, and both log to the same sweep.

Operating System Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic With 2 Nvidia GPUs on CUDA 10.2 running PyTorch-Lightning v1.1, PyTorch v1.8 Note the issue persists even without using GPU.

Additional context Full trace:

(torch) ubuntu@ip-172-31-10-97:~/oppg$ CUDA_VISIBLE_DEVICES=0 wandb agent kaiwenw/CartPole-v0/qnjyxjtu
wandb: Starting wandb agent 🕵️
2020-12-14 09:13:58,062 - wandb.wandb_agent - INFO - Running runs: []
400 response executing GraphQL.
{"errors":[{"message":"ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()","path":["agentHeartbeat"]}],"data":{"agentHeartbeat":null}}
wandb: ERROR Error while calling W&B API: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() (<Response [400]>)
wandb: Terminating and syncing runs. Press ctrl-c to kill.
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 1317, in agent_heartbeat
    response = self.gql(
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/old/retry.py", line 96, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 136, in execute
    six.reraise(*sys.exc_info())
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 130, in execute
    return self.client.execute(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/transport/requests.py", line 39, in execute
    request.raise_for_status()
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api.wandb.ai/graphql

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/torch/bin/wandb", line 8, in <module>
    sys.exit(cli())
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/cli/cli.py", line 86, in wrapper
    return func(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/cli/cli.py", line 806, in agent
    wandb_agent.agent(sweep_id, entity=entity, project=project, count=count)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/wandb_agent.py", line 558, in agent
    return run_agent(
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/wandb_agent.py", line 518, in run_agent
    agent.run()
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/wandb_agent.py", line 252, in run
    commands = self._api.agent_heartbeat(agent_id, {}, run_status)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/apis/internal.py", line 92, in agent_heartbeat
    return self.api.agent_heartbeat(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 1328, in agent_heartbeat
    message = ast.literal_eval(e.args[0])["message"]
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    400 Client Error: Bad Request for url: https://api.wandb.ai/graphql
        ^
SyntaxError: invalid syntax

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
kengzcommented, Jan 1, 2021

Experiencing the same issue here while doing sweep - first run ok, after that it throws the error above. I have something like the following in my sweep

parameters:
  arc.linear.layers:
    values:
      - [256, 256]
      - [512, 512]
      - [1024, 1024]
  arc.linear.batch_norm:
    values:
      - true
      - false

The doc link above does not provide any clue on dealing with using list as value.

From the error it seems like the issue lies with how the call to W&B API parses the value - perhaps a guard in the API / API call that guards for list values? Or if the API could accept list?

400 response executing GraphQL.
{"errors":[{"message":"ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()","path":["agentHeartbeat"]}],"data":{"agentHeartbeat":null}}
wandb: ERROR Error while calling W&B API: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() (<Response [400]>)
wandb: Terminating and syncing runs. Press ctrl-c to kill.

...

400 Client Error: Bad Request for url: https://api.wandb.ai/graphql

A temporary workaround is to put the list values as strings, then in the trainer process parse the string into list again. This is definitely not ideal tho.

0reactions
sydhollcommented, Nov 22, 2021

In the past year we’ve majorly reworked the CLI and UI for Weights & Biases. We’re closing issues older than 6 months. Please comment to reopen.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multi-GPU sweep fails to create a second agent with 400 error
Describe the bug Multi-GPU sweep fails to create a second agent with error attached. The first agent starts as expected.
Read more >
FAQ - Documentation - Weights & Biases - Wandb
Finally, start new W&B Sweep agents with the new Sweep ID. Parameter combinations with completed W&B Runs are not re-executed.
Read more >
Problems with multi-gpus - MATLAB Answers - MathWorks
Error using Composite/subsasgn (line 103). An invalid indexing request was made. Struct contents reference from a non-struct array object.
Read more >
How do I set GPU affinity of workers - RLlib - Ray.io
Lets say I am using IMPALA with several workers and I have 4 GPUs. I want to be able to say GPUs 0,1,2...
Read more >
CUDA_ERROR_OUT_OF_MEM...
It seems the GPU memory is still allocated, and therefore cannot be allocated again. It was solved by manually ending all python processes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found