question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Read timed out when total combination of tunable parameters exceed about 15 million

See original GitHub issue

wandb --version && python --version && uname

wandb, version 0.8.21 Python 3.6.9 Linux

What I Did

wandb sweep sweep.yaml

method: grid
metric:
  name: val_acc
  goal: minimize
parameters:
  setting:
    distribution: categorical
    values:
      - stack_ffn
      - act_pkm
      - stack_encdec_ffn
  q_linear:
    distribution: categorical
    values:
      - true
      - false
  k_linear:
    distribution: categorical
    values:
      - true
      - false
  v_linear:
    distribution: categorical
    values:
      - true
      - false
  o_linear:
    distribution: categorical
    values:
      - true
      - false
  q_norm:
    distribution: categorical
    values:
      - true
      - false
  k_norm:
    distribution: categorical
    values:
      - true
      - false
  v_norm:
    distribution: categorical
    values:
      - true
      - false
  inner_norm:
    distribution: categorical
    values:
      - true
      - false
  norm_way:
    distribution: categorical
    values:
      - C
      - CL
  q_activ:
    distribution: categorical
    values:
      - no
      - softmax
      - sparsemax
  k_activ:
    distribution: categorical
    values:
      - no
      - softmax
      - sparsemax
  v_activ:
    distribution: categorical
    values:
      - no
      - softmax
      - sparsemax
  inner_activ:
    distribution: categorical
    values:
      - no
      - softmax
      - sparsemax
  proj_share:
    distribution: categorical
    values:
      - qk
      - qv
      - kv
      - qkv
      - no
  proj_way:
    distribution: categorical
    values:
      - ->head
      - head->
      - head->_share
  relative:
    distribution: categorical
    values:
      - true
      - false
  q_downscale:
    distribution: categorical
    values:
      - true
      - false
  k_downscale:
    distribution: categorical
    values:
      - true
      - false
  v_downscale:
    distribution: categorical
    values:
      - true
      - false
  inner_downscale:
    distribution: categorical
    values:
      - true
      - false
  inner_mul:
    distribution: categorical
    values:
      - QK
      - KV

and get timeout

Network error (ReadTimeout), entering retry loop. See /home/shulie8518/Workspace/Review_Attention/wandb/debug.log for full traceback.

debug.log

2020-01-19 16:01:03,860 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:15,154 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:27,885 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:38,038 ERROR   MainThread:31362 [retry.py:__call__():108] Retry attempt failed:
Traceback (most recent call last):
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1344, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/util/retry.py", line 357, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 389, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 309, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/wandb/retry.py", line 95, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/wandb/apis/internal.py", line 110, in execute
    return self.client.execute(*args, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/transport/requests.py", line 38, in execute
    request = requests.post(self.url, **post_args)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/adapters.py", line 521, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Read timed out. (read timeout=10)
2020-01-19 16:01:42,484 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:02:00,755 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:02:27,699 DEBUG   MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:34 (11 by maintainers)

github_iconTop GitHub Comments

4reactions
honglinycommented, Feb 23, 2021

The issue still persists on my side

2reactions
raubitsjcommented, Jan 19, 2020

Thanks for the report. We will look into this and figure out ifwe can handle this size of combinations or if we have to set some limits.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Read timed out when total combination of tunable parameters ...
Read timed out when total combination of tunable parameters exceed ... When the number of combination exceed 14~15 million, I get the error....
Read more >
Limits and configuration reference guide - Azure Logic Apps
This reference guide describes the limits and configuration information for Azure Logic Apps and related resources. Based on your scenario, ...
Read more >
Tuning Tomcat For A High Throughput, Fail Fast System
They would get a mix of read and connect timeouts. Read timeouts can be particularly bad if the read timeouts are set to...
Read more >
Part 3: Cost Efficient Executor Configuration for Apache Spark
The first step to determine an efficient executor config is to figure out how many actual CPUs (i.e. not virtual CPUs) are available...
Read more >
Oracle Solaris Tunable Parameters Reference Manual
Oracle Solaris System Tuning in the Solaris 10 Release . ... Values specified in this file are read at boot time and are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found