In k8s worker does not resolve master hostname until killed
See original GitHub issueAlluxio Version: 2.0.0-RC3
Describe the bug If the worker is not able to resolve the master hostname on start (as the master service was not booted yet), it fails to connect to master even after the master is booted up. Worker retries keep failing until the worker dies and is restarted by k8s.
To Reproduce start worker start master
worker fails for a long time in an error loop and dies
Expected behavior error loop should terminate and successfully connect to master once its up
Urgency not urgent. workaround is to have a short worker retry timeout or wait for master to be up before starting the worker.
Additional context
Worker logs:
2019-06-12 21:32:45,544 WARN RetryUtils - Failed to load cluster default configuration with master (attempt 687): alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNKNOWN
2019-06-12 21:32:50,606 WARN RetryUtils - Failed to load cluster default configuration with master (attempt 688): alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNKNOWN
2019-06-12 21:32:55,943 WARN RetryUtils - Failed to load cluster default configuration with master (attempt 689): alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNKNOWN
2019-06-12 21:32:58,069 WARN RetryUtils - Failed to load cluster default configuration with master (attempt 690): alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNKNOWN
2019-06-12 21:32:58,070 ERROR AlluxioWorker - Fatal error: Failed to load cluster default configuration for worker: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNKNOWN
alluxio-master:19998
here is the unresolved address, the resolved address should look like : alluxio-master/172.31.1.114:19998
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
Fix: https://github.com/Alluxio/alluxio/pull/9286
please do. thx