Read timed out when total combination of tunable parameters exceed about 15 million
See original GitHub issuewandb --version && python --version && uname
wandb, version 0.8.21 Python 3.6.9 Linux
What I Did
wandb sweep sweep.yaml
method: grid
metric:
name: val_acc
goal: minimize
parameters:
setting:
distribution: categorical
values:
- stack_ffn
- act_pkm
- stack_encdec_ffn
q_linear:
distribution: categorical
values:
- true
- false
k_linear:
distribution: categorical
values:
- true
- false
v_linear:
distribution: categorical
values:
- true
- false
o_linear:
distribution: categorical
values:
- true
- false
q_norm:
distribution: categorical
values:
- true
- false
k_norm:
distribution: categorical
values:
- true
- false
v_norm:
distribution: categorical
values:
- true
- false
inner_norm:
distribution: categorical
values:
- true
- false
norm_way:
distribution: categorical
values:
- C
- CL
q_activ:
distribution: categorical
values:
- no
- softmax
- sparsemax
k_activ:
distribution: categorical
values:
- no
- softmax
- sparsemax
v_activ:
distribution: categorical
values:
- no
- softmax
- sparsemax
inner_activ:
distribution: categorical
values:
- no
- softmax
- sparsemax
proj_share:
distribution: categorical
values:
- qk
- qv
- kv
- qkv
- no
proj_way:
distribution: categorical
values:
- ->head
- head->
- head->_share
relative:
distribution: categorical
values:
- true
- false
q_downscale:
distribution: categorical
values:
- true
- false
k_downscale:
distribution: categorical
values:
- true
- false
v_downscale:
distribution: categorical
values:
- true
- false
inner_downscale:
distribution: categorical
values:
- true
- false
inner_mul:
distribution: categorical
values:
- QK
- KV
and get timeout
Network error (ReadTimeout), entering retry loop. See /home/shulie8518/Workspace/Review_Attention/wandb/debug.log for full traceback.
debug.log
2020-01-19 16:01:03,860 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:15,154 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:27,885 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:01:38,038 ERROR MainThread:31362 [retry.py:__call__():108] Retry attempt failed:
Traceback (most recent call last):
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 383, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.7/http/client.py", line 1344, in getresponse
response.begin()
File "/usr/lib/python3.7/http/client.py", line 306, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.7/http/client.py", line 267, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/util/retry.py", line 357, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 389, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 309, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/wandb/retry.py", line 95, in __call__
result = self._call_fn(*args, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/wandb/apis/internal.py", line 110, in execute
return self.client.execute(*args, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/client.py", line 52, in execute
result = self._get_result(document, *args, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/client.py", line 60, in _get_result
return self.transport.execute(document, *args, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/gql/transport/requests.py", line 38, in execute
request = requests.post(self.url, **post_args)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/home/shulie8518/VirtualEnvironment/py37/lib/python3.7/site-packages/requests/adapters.py", line 521, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Read timed out. (read timeout=10)
2020-01-19 16:01:42,484 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:02:00,755 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
2020-01-19 16:02:27,699 DEBUG MainThread:31362 [connectionpool.py:_new_conn():824] Starting new HTTPS connection (1): api.wandb.ai
Issue Analytics
- State:
- Created 4 years ago
- Comments:34 (11 by maintainers)
Top Results From Across the Web
Read timed out when total combination of tunable parameters ...
Read timed out when total combination of tunable parameters exceed ... When the number of combination exceed 14~15 million, I get the error....
Read more >Limits and configuration reference guide - Azure Logic Apps
This reference guide describes the limits and configuration information for Azure Logic Apps and related resources. Based on your scenario, ...
Read more >Tuning Tomcat For A High Throughput, Fail Fast System
They would get a mix of read and connect timeouts. Read timeouts can be particularly bad if the read timeouts are set to...
Read more >Part 3: Cost Efficient Executor Configuration for Apache Spark
The first step to determine an efficient executor config is to figure out how many actual CPUs (i.e. not virtual CPUs) are available...
Read more >Oracle Solaris Tunable Parameters Reference Manual
Oracle Solaris System Tuning in the Solaris 10 Release . ... Values specified in this file are read at boot time and are...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The issue still persists on my side
Thanks for the report. We will look into this and figure out ifwe can handle this size of combinations or if we have to set some limits.