Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wandb hangs experiment (10 min+): Internal Server Error for url: https://api.wandb.ai/graphql

See original GitHub issue

wandb --version && python --version && uname

Weights and Biases version: 0.8.35
Python version: 3.7
Operating System: Linux

Description

For a few days, I noticed experiments hanging on wandb logging. Sometimes I even saw crashes.

So far, downgrading to 0.8.33 seems to help. Will report if the problem arises again.

What I Did

2020-05-05 09:25:36,056 ERROR   Thread-18 :22373 [internal.py:execute():113] 500 response executing GraphQL.
2020-05-05 09:25:36,057 ERROR   Thread-18 :22373 [internal.py:execute():114] {"error":"Error 1040: Too many connections"}

2020-05-05 09:25:36,058 ERROR   Thread-18 :22373 [retry.py:__call__():108] Retry attempt failed:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/wandb/retry.py", line 95, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/wandb/apis/internal.py", line 116, in execute
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/wandb/apis/internal.py", line 110, in execute
    return self.client.execute(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gql/transport/requests.py", line 39, in execute
    request.raise_for_status()
  File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.wandb.ai/graphql
2020-05-05 09:25:39,928 INFO    Thread-3  :22373 [run_manager.py:_on_file_modified():691] file/dir modified: <redacted>/run-20200505_072213-camelyon-16384-full-correct-loss/wandb-metadata.json
2020-05-05 09:25:40,883 ERROR   Thread-18 :22373 [internal.py:execute():113] 500 response executing GraphQL.
2020-05-05 09:25:40,884 ERROR   Thread-18 :22373 [internal.py:execute():114] {"error":"Error 1040: Too many connections"}

2020-05-05 09:25:50,866 ERROR   Thread-18 :22373 [internal.py:execute():113] 500 response executing GraphQL.
2020-05-05 09:25:50,867 ERROR   Thread-18 :22373 [internal.py:execute():114] {"error":"Error 1040: Too many connections"}

2020-05-05 09:25:51,143 WARNING Thread-7  :22373 [util.py:request_with_retry():614] requests_with_retry encountered retryable exception: 500 Server Error: Internal Server Error for url: https://api.wandb.ai/files/hanspinckaers/camelyon/camelyon-16384-full-correct-loss/file_stream. args: ('https://api.wandb.ai/files/hanspinckaers/camelyon/camelyon-16384-full-correct-loss/file_stream',), kwargs: {'json': {'files': {'output.log':

Issue Analytics

State:
Created 3 years ago
Comments:9 (3 by maintainers)

Top GitHub Comments

1reaction

vanpeltcommented, Jun 10, 2021

@richardrl we had an outage last night that caused these errors. Everything should be functioning properly now.

0reactions

sadra-barikbincommented, Apr 25, 2022

Hi, I’m experiencing this issue in Google Colab environment. To reproduce:

#bash
wandb login  --cloud "API_KEY"

then

#python
api = wandb.Api()
runs = api.runs(f'{entity}/{project_name}')
runs[0]

output:

Retry attempt failed:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/wandb/sdk/lib/retry.py", line 102, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/wandb/apis/public.py", line 205, in execute
    return self._client.execute(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/wandb/vendor/gql-0.2.0/wandb_gql/transport/requests.py", line 39, in execute
    request.raise_for_status()
  File "/usr/local/lib/python3.7/dist-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.wandb.ai/graphql
wandb: Network error (HTTPError), entering retry loop.```

Top Results From Across the Web

wandb hangs experiment (10 min+): Internal Server Error for url: https://api.wandb.ai/graphql #1016. Closed. HansPinckaers opened this issue on ...

Troubleshooting - Documentation - Weights & Biases - Wandb

This is likely a connection problem — if your server loses internet access and data stops syncing to W&B, we mark the run...

Troubleshooting - Documentation - Weights & Biases - WandB

We run wandb in a separate process to make sure that if wandb somehow crashes, your training will continue to run. If the...

Topics tagged wandb

Topic Replies Views Activity 100% offline sweep · W&B Help · sweeps , wandb 3 133 December 1, 2022 ERROR Abnormal program exit · W&B...

Technical FAQ - Documentation - Weights & Biases

Frequently Asked Questions. General · Metrics & Performance · Setup · Troubleshooting · Previous. FAQ · Next. General. Last modified 6mo ago. Cookies....