question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wandb hangs experiment (10 min+): Internal Server Error for url: https://api.wandb.ai/graphql

See original GitHub issue

wandb --version && python --version && uname

  • Weights and Biases version: 0.8.35
  • Python version: 3.7
  • Operating System: Linux

Description

For a few days, I noticed experiments hanging on wandb logging. Sometimes I even saw crashes.

So far, downgrading to 0.8.33 seems to help. Will report if the problem arises again.

What I Did

2020-05-05 09:25:36,056 ERROR   Thread-18 :22373 [internal.py:execute():113] 500 response executing GraphQL.
2020-05-05 09:25:36,057 ERROR   Thread-18 :22373 [internal.py:execute():114] {"error":"Error 1040: Too many connections"}

2020-05-05 09:25:36,058 ERROR   Thread-18 :22373 [retry.py:__call__():108] Retry attempt failed:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/wandb/retry.py", line 95, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/wandb/apis/internal.py", line 116, in execute
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/wandb/apis/internal.py", line 110, in execute
    return self.client.execute(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gql/transport/requests.py", line 39, in execute
    request.raise_for_status()
  File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.wandb.ai/graphql
2020-05-05 09:25:39,928 INFO    Thread-3  :22373 [run_manager.py:_on_file_modified():691] file/dir modified: <redacted>/run-20200505_072213-camelyon-16384-full-correct-loss/wandb-metadata.json
2020-05-05 09:25:40,883 ERROR   Thread-18 :22373 [internal.py:execute():113] 500 response executing GraphQL.
2020-05-05 09:25:40,884 ERROR   Thread-18 :22373 [internal.py:execute():114] {"error":"Error 1040: Too many connections"}

2020-05-05 09:25:50,866 ERROR   Thread-18 :22373 [internal.py:execute():113] 500 response executing GraphQL.
2020-05-05 09:25:50,867 ERROR   Thread-18 :22373 [internal.py:execute():114] {"error":"Error 1040: Too many connections"}

2020-05-05 09:25:51,143 WARNING Thread-7  :22373 [util.py:request_with_retry():614] requests_with_retry encountered retryable exception: 500 Server Error: Internal Server Error for url: https://api.wandb.ai/files/hanspinckaers/camelyon/camelyon-16384-full-correct-loss/file_stream. args: ('https://api.wandb.ai/files/hanspinckaers/camelyon/camelyon-16384-full-correct-loss/file_stream',), kwargs: {'json': {'files': {'output.log':

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
vanpeltcommented, Jun 10, 2021

@richardrl we had an outage last night that caused these errors. Everything should be functioning properly now.

0reactions
sadra-barikbincommented, Apr 25, 2022

Hi, I’m experiencing this issue in Google Colab environment. To reproduce:

#bash
wandb login  --cloud "API_KEY"

then

#python
api = wandb.Api()
runs = api.runs(f'{entity}/{project_name}')
runs[0]

output:

Retry attempt failed:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/wandb/sdk/lib/retry.py", line 102, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/wandb/apis/public.py", line 205, in execute
    return self._client.execute(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/wandb/vendor/gql-0.2.0/wandb_gql/transport/requests.py", line 39, in execute
    request.raise_for_status()
  File "/usr/local/lib/python3.7/dist-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.wandb.ai/graphql
wandb: Network error (HTTPError), entering retry loop.```
Read more comments on GitHub >

github_iconTop Results From Across the Web

Sign up - GitHub
wandb hangs experiment (10 min+): Internal Server Error for url: https://api.wandb.ai/graphql #1016. Closed. HansPinckaers opened this issue on ...
Read more >
Troubleshooting - Documentation - Weights & Biases - Wandb
This is likely a connection problem — if your server loses internet access and data stops syncing to W&B, we mark the run...
Read more >
Troubleshooting - Documentation - Weights & Biases - WandB
We run wandb in a separate process to make sure that if wandb somehow crashes, your training will continue to run. If the...
Read more >
Topics tagged wandb
Topic Replies Views Activity 100% offline sweep · W&B Help · sweeps , wandb 3 133 December 1, 2022 ERROR Abnormal program exit · W&B...
Read more >
Technical FAQ - Documentation - Weights & Biases
Frequently Asked Questions. General · Metrics & Performance · Setup · Troubleshooting · Previous. FAQ · Next. General. Last modified 6mo ago. Cookies....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found