question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SSL error when cleaning up pods

See original GitHub issue

Hi, I’m having a small issue running dask-kubernetes on a local Kubernetes 1.14.3 cluster (the one provided by the latest docker desktop: the job runs fine and I get back the correct results but it looks like there is an SSL issue when it tries to clean up the pods. File "/usr/local/lib/python3.7/site-packages/dask_kubernetes/core.py", line 544, in _cleanup_pods This is the error that I get when I run my script job:

root@69feaef9d947:/app# python process_receipts_dask2_kubernetes.py --testrun
/usr/local/lib/python3.7/site-packages/distributed/client.py:2: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import defaultdict, Iterator
/usr/local/lib/python3.7/site-packages/distributed/publish.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/usr/local/lib/python3.7/site-packages/distributed/scheduler.py:2: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import defaultdict, deque, OrderedDict, Mapping, Set
dasboard: {'dashboard': 8787}
                               file           data_gt  ... partita_iva_ocr partita_iva_match
0  MV_trfktqxxci_20180830070237.jpg               nan  ...     xxxxxxxxxxx              True
0  MV_tentyvsblx_20180903070230.jpg  29/08/2018 12.08  ...            None             False
0  MV_qrmiwvlyio_20180908100205.jpg               nan  ...            None             False
0  MV_jxavynnpmd_20180903070227.jpg               nan  ...            None             False
0  MV_dpaztptfio_20180908100214.jpg        06/09/2018  ...            None             False

[5 rows x 10 columns]
{'data': {'n_tot': 5, 'n_gt': 2, 'n_true_pos': 2, 'n_false_pos': 0, 'n_true_neg': 3, 'n_false_neg': 0, 'precision': 1.0, 'recall': 1.0, 'f1_score': 1.0}, 'importo': {'n_tot': 5, 'n_gt': 5, 'n_true_pos': 1, 'n_false_pos': 0, 'n_true_neg': 0, 'n_false_neg': 4, 'precision': 1.0, 'recall': 0.2, 'f1_score': 0.33333333333333337}, 'partita_iva': {'n_tot': 5, 'n_gt': 4, 'n_true_pos': 1, 'n_false_pos': 0, 'n_true_neg': 1, 'n_false_neg': 3, 'precision': 1.0, 'recall': 0.25, 'f1_score': 0.4}}
2019-08-13 14:43:28,926 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
2019-08-13 14:43:28,926 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
2019-08-13 14:43:28,932 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
2019-08-13 14:43:28,932 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
2019-08-13 14:43:28,937 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
2019-08-13 14:43:28,937 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 322, in ssl_wrap_socket
    context.load_verify_locations(ca_certs, ca_cert_dir)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 344, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 843, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 370, in connect
    ssl_context=context)
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
    raise SSLError(e)
urllib3.exceptions.SSLError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/weakref.py", line 648, in _exitfunc
    f()
  File "/usr/local/lib/python3.7/weakref.py", line 572, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/usr/local/lib/python3.7/site-packages/dask_kubernetes/core.py", line 544, in _cleanup_pods
    pods = api.list_namespaced_pod(namespace, label_selector=format_labels(labels))
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 12372, in list_namespaced_pod
    (data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 12472, in list_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 334, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 168, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 355, in request
    headers=headers)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 231, in GET
    query_params=query_params)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 205, in request
    headers=headers)
  File "/usr/local/lib/python3.7/site-packages/urllib3/request.py", line 68, in request
    **urlopen_kw)
  File "/usr/local/lib/python3.7/site-packages/urllib3/request.py", line 89, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/usr/local/lib/python3.7/site-packages/urllib3/poolmanager.py", line 326, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 670, in urlopen
    **response_kw)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 670, in urlopen
    **response_kw)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 670, in urlopen
    **response_kw)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 641, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='kubernetes.docker.internal', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=dask.org%2Fcluster-name%3Dreceipts%2Cuser%3Droot%2Capp%3Ddask%2Ccomponent%3Ddask-worker (Caused by SSLError(FileNotFoundError(2, 'No such file or directory')))

the code is running inside a container that is the same image as the ones specified in the worker-spec.yml

kind: Pod
spec:
  restartPolicy: Never
  
  containers:
  - image: sroie_app
    imagePullPolicy: IfNotPresent
    command: [/usr/local/bin/python]
    #args: [/usr/local/bin/dask-worker, --nthreads, '2', --no-dashboard, --memory-limit, 1GB, --death-timeout, '60']
    args: [/usr/local/bin/dask-worker, --nthreads, '2', --no-dashboard, --memory-limit, 1GB, --death-timeout, '3600']
    name: dask
    resources:
      limits:
        cpu: "1"
        memory: 1G
      requests:
        cpu: "1"
        memory: 512M

it seems related to #113 but in my case the container is running inside the k8s cluster and I can reach all of the workers correctly.

Thanks,

Giordano

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
csalacommented, May 3, 2020

I’m hitting this error again when running with the following versions:

dask-kubernetes==0.10.1 kubernetes==11.0.0 kubernetes-asyncio==11.2.0

Applying a change similar to what @giordyb suggested on #172 makes the problem go away:

  1. Pass self.core_api down to self._cleanup_resources when calling finalize instead of creating a new CoreV1Api instance.
  2. Add a yield statement to the calls that get the pods and services lists:
def _cleanup_resources(namespace, labels, core_api):                                          
    """ Remove all pods with these labels in this namespace """                               
                                                                                              
    pods = yield core_api.list_namespaced_pod(namespace, label_selector=format_labels(labels))
    ...                                                                
                                                                                              
    services = yield core_api.list_namespaced_service(                                        
        namespace, label_selector=format_labels(labels)                                       
    )                                                                                         

I’ll happily open a new Issue or a PR if requested.

1reaction
giordybcommented, Aug 16, 2019

@jacobtomlinson thanks, it worked like a charm, I guess I should have rtfm 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Debugging SSL Handshake problems of kubernetes pods ...
Ping, DNS, HTTP, works but TLS is having problems with the Handshake. I made a tcpdump and see ClientHello's and then immediatly a...
Read more >
Manage TLS Certificates in a Cluster - Kubernetes
You will need to add the CA certificate bundle to the list of CA ... CA certificate using a ConfigMap that your pods...
Read more >
Troubleshooting Key Management Service - IBM
Restart the pod by removing the key-management-pep pod. ... Kubernetes Ingress Controller Fake Certificate is used as the default SSL certificate in NGINX ......
Read more >
How to Fix CreateContainerError & CreateContainerConfigError
You can identify these errors by running the kubectl get pods command – the pod ... This means the container runtime did not...
Read more >
Learn why your EKS pod is stuck in the ContainerCreating state
You get this error because either the pod isn't running properly, or the certificate that the pod is using isn't created successfully. This ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found