SSL error when cleaning up pods
See original GitHub issueHi,
I’m having a small issue running dask-kubernetes on a local Kubernetes 1.14.3 cluster (the one provided by the latest docker desktop: the job runs fine and I get back the correct results but it looks like there is an SSL issue when it tries to clean up the pods.
File "/usr/local/lib/python3.7/site-packages/dask_kubernetes/core.py", line 544, in _cleanup_pods
This is the error that I get when I run my script job:
root@69feaef9d947:/app# python process_receipts_dask2_kubernetes.py --testrun
/usr/local/lib/python3.7/site-packages/distributed/client.py:2: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import defaultdict, Iterator
/usr/local/lib/python3.7/site-packages/distributed/publish.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import MutableMapping
/usr/local/lib/python3.7/site-packages/distributed/scheduler.py:2: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import defaultdict, deque, OrderedDict, Mapping, Set
dasboard: {'dashboard': 8787}
file data_gt ... partita_iva_ocr partita_iva_match
0 MV_trfktqxxci_20180830070237.jpg nan ... xxxxxxxxxxx True
0 MV_tentyvsblx_20180903070230.jpg 29/08/2018 12.08 ... None False
0 MV_qrmiwvlyio_20180908100205.jpg nan ... None False
0 MV_jxavynnpmd_20180903070227.jpg nan ... None False
0 MV_dpaztptfio_20180908100214.jpg 06/09/2018 ... None False
[5 rows x 10 columns]
{'data': {'n_tot': 5, 'n_gt': 2, 'n_true_pos': 2, 'n_false_pos': 0, 'n_true_neg': 3, 'n_false_neg': 0, 'precision': 1.0, 'recall': 1.0, 'f1_score': 1.0}, 'importo': {'n_tot': 5, 'n_gt': 5, 'n_true_pos': 1, 'n_false_pos': 0, 'n_true_neg': 0, 'n_false_neg': 4, 'precision': 1.0, 'recall': 0.2, 'f1_score': 0.33333333333333337}, 'partita_iva': {'n_tot': 5, 'n_gt': 4, 'n_true_pos': 1, 'n_false_pos': 0, 'n_true_neg': 1, 'n_false_neg': 3, 'precision': 1.0, 'recall': 0.25, 'f1_score': 0.4}}
2019-08-13 14:43:28,926 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
2019-08-13 14:43:28,926 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
2019-08-13 14:43:28,932 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
2019-08-13 14:43:28,932 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
2019-08-13 14:43:28,937 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
2019-08-13 14:43:28,937 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 322, in ssl_wrap_socket
context.load_verify_locations(ca_certs, ca_cert_dir)
FileNotFoundError: [Errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 603, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 344, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 843, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 370, in connect
ssl_context=context)
File "/usr/local/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
raise SSLError(e)
urllib3.exceptions.SSLError: [Errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/weakref.py", line 648, in _exitfunc
f()
File "/usr/local/lib/python3.7/weakref.py", line 572, in __call__
return info.func(*info.args, **(info.kwargs or {}))
File "/usr/local/lib/python3.7/site-packages/dask_kubernetes/core.py", line 544, in _cleanup_pods
pods = api.list_namespaced_pod(namespace, label_selector=format_labels(labels))
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 12372, in list_namespaced_pod
(data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 12472, in list_namespaced_pod_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 334, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 168, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 355, in request
headers=headers)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 231, in GET
query_params=query_params)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 205, in request
headers=headers)
File "/usr/local/lib/python3.7/site-packages/urllib3/request.py", line 68, in request
**urlopen_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/request.py", line 89, in request_encode_url
return self.urlopen(method, url, **extra_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/poolmanager.py", line 326, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 670, in urlopen
**response_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 670, in urlopen
**response_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 670, in urlopen
**response_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 641, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='kubernetes.docker.internal', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker (Caused by SSLError(FileNotFoundError(2, 'No such file or directory')))
the code is running inside a container that is the same image as the ones specified in the worker-spec.yml
kind: Pod
spec:
restartPolicy: Never
containers:
- image: sroie_app
imagePullPolicy: IfNotPresent
command: [/usr/local/bin/python]
#args: [/usr/local/bin/dask-worker, --nthreads, '2', --no-dashboard, --memory-limit, 1GB, --death-timeout, '60']
args: [/usr/local/bin/dask-worker, --nthreads, '2', --no-dashboard, --memory-limit, 1GB, --death-timeout, '3600']
name: dask
resources:
limits:
cpu: "1"
memory: 1G
requests:
cpu: "1"
memory: 512M
it seems related to #113 but in my case the container is running inside the k8s cluster and I can reach all of the workers correctly.
Thanks,
Giordano
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (5 by maintainers)
Top Results From Across the Web
Debugging SSL Handshake problems of kubernetes pods ...
Ping, DNS, HTTP, works but TLS is having problems with the Handshake. I made a tcpdump and see ClientHello's and then immediatly a...
Read more >Manage TLS Certificates in a Cluster - Kubernetes
You will need to add the CA certificate bundle to the list of CA ... CA certificate using a ConfigMap that your pods...
Read more >Troubleshooting Key Management Service - IBM
Restart the pod by removing the key-management-pep pod. ... Kubernetes Ingress Controller Fake Certificate is used as the default SSL certificate in NGINX ......
Read more >How to Fix CreateContainerError & CreateContainerConfigError
You can identify these errors by running the kubectl get pods command – the pod ... This means the container runtime did not...
Read more >Learn why your EKS pod is stuck in the ContainerCreating state
You get this error because either the pod isn't running properly, or the certificate that the pod is using isn't created successfully. This ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m hitting this error again when running with the following versions:
dask-kubernetes==0.10.1 kubernetes==11.0.0 kubernetes-asyncio==11.2.0
Applying a change similar to what @giordyb suggested on #172 makes the problem go away:
self.core_api
down toself._cleanup_resources
when callingfinalize
instead of creating a newCoreV1Api
instance.yield
statement to the calls that get the pods and services lists:I’ll happily open a new Issue or a PR if requested.
@jacobtomlinson thanks, it worked like a charm, I guess I should have rtfm 😃