question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

In k8s worker does not resolve master hostname until killed

See original GitHub issue

Alluxio Version: 2.0.0-RC3

Describe the bug If the worker is not able to resolve the master hostname on start (as the master service was not booted yet), it fails to connect to master even after the master is booted up. Worker retries keep failing until the worker dies and is restarted by k8s.

To Reproduce start worker start master

worker fails for a long time in an error loop and dies

Expected behavior error loop should terminate and successfully connect to master once its up

Urgency not urgent. workaround is to have a short worker retry timeout or wait for master to be up before starting the worker.

Additional context

Worker logs:

2019-06-12 21:32:45,544 WARN  RetryUtils - Failed to load cluster default configuration with master (attempt 687): alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNKNOWN
2019-06-12 21:32:50,606 WARN  RetryUtils - Failed to load cluster default configuration with master (attempt 688): alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNKNOWN
2019-06-12 21:32:55,943 WARN  RetryUtils - Failed to load cluster default configuration with master (attempt 689): alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNKNOWN
2019-06-12 21:32:58,069 WARN  RetryUtils - Failed to load cluster default configuration with master (attempt 690): alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNKNOWN
2019-06-12 21:32:58,070 ERROR AlluxioWorker - Fatal error: Failed to load cluster default configuration for worker: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNKNOWN

alluxio-master:19998 here is the unresolved address, the resolved address should look like : alluxio-master/172.31.1.114:19998

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
madanaditcommented, Jun 13, 2019
0reactions
madanaditcommented, Jun 14, 2019

please do. thx

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kubernetes: Pods Can't Resolve Hostnames - Stack Overflow
I currently have 1 master and 1 node running on two CentOS7 instances in OpenStack. I deployed using kubeadm . Here are the...
Read more >
"kubeadm join" does not add worker node to the cluster #574
Kubelet informed of new secure connection details. Run 'kubectl get nodes' on the master to see this machine join. sudo systemctl status kubelet...
Read more >
Troubleshoot CDF related issues - ITOM Practitioner Portal
If it cannot be resolved by the DNS server, follow the steps below on each master node to set up the hostname resolution....
Read more >
Troubleshooting - IBM
If you entered the hostname in the cp4a-post-deployment.sh script in an environment that uses apps. , the routes do not work. Workaround: When...
Read more >
Managing Nodes | OpenShift Container Platform 3.11
NotReady. The node is not passing the health checks performed from the master. SchedulingDisabled. Pods cannot be scheduled for placement on the node....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found