Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Slow re-election when elected master pod is deleted

See original GitHub issue

First of all - thank you guys for the chart!

I was playing around with the multi-node example and experienced some odd behavior. Here’s how I’m reproducing the issue.

After the multi example is deployed, open up the multi-data service to your local in one terminal:

$ kubectl port-forward service/multi-data 9200

Watch the call to /_cat/master in another terminal:

$ watch -n1 time curl -s http://localhost:9200/_cat/master?v

In a third terminal, whoever the elected master is, delete them:

$ kubectl delete pod multi-master-0

The API call in the second terminal will now hang. After 30 seconds, the request will timeout and we might see the following error for a split second:

{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}

Soon after, the cluster recovers and the API call from the second window starts responding again. Here are the logs off another master node before and after the re-election:

[2019-02-17T01:49:03,736][INFO ][o.e.d.z.ZenDiscovery     ] [multi-master-1] master_left [{multi-master-0}{KZPjmKZtSf2LGV-IyvtfOg}{Es2wrbyiQdWz5CnT4V5wkA}{10.40.1.14}{10.40.1.14:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [shut_down]
[2019-02-17T01:49:03,736][WARN ][o.e.d.z.ZenDiscovery     ] [multi-master-1] master left (reason = shut_down), current nodes: nodes:
   {multi-master-0}{KZPjmKZtSf2LGV-IyvtfOg}{Es2wrbyiQdWz5CnT4V5wkA}{10.40.1.14}{10.40.1.14:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, master
   {multi-data-0}{Ndz2WGGiSz6Y1XO1tWyWRw}{nPoZagPdRr2Tq45BPAkh_g}{10.40.2.13}{10.40.2.13:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   {multi-data-2}{MmkHmP6XRTibriDLazv_1A}{iS1mfA0JQ-yURqZ4-ng2zQ}{10.40.1.8}{10.40.1.8:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   {multi-master-1}{uetiuhetRbasFRNLqI6ixg}{AwsDmF2aTZS1JesCE0HZ0A}{10.40.2.15}{10.40.2.15:9300}{ml.machine_memory=2147483648, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, local
   {multi-data-1}{Uh9aucMmS6uLRBeG5559_w}{tlxqIDoZRjCLMfMWxU0IyQ}{10.40.0.11}{10.40.0.11:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}

[2019-02-17T01:49:03,830][WARN ][o.e.t.TcpTransport       ] [multi-master-1] send message failed [channel: Netty4TcpChannel{localAddress=0.0.0.0/0.0.0.0:9300, remoteAddress=/10.40.1.14:36958}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2019-02-17T01:49:07,107][INFO ][o.e.c.s.ClusterApplierService] [multi-master-1] detected_master {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, reason: apply cluster state (from master [master {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [95]])
[2019-02-17T01:49:34,832][WARN ][o.e.c.NodeConnectionsService] [multi-master-1] failed to connect to node {multi-master-0}{KZPjmKZtSf2LGV-IyvtfOg}{Es2wrbyiQdWz5CnT4V5wkA}{10.40.1.14}{10.40.1.14:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [multi-master-0][10.40.1.14:9300] connect_timeout[30s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:1576) ~[elasticsearch-6.6.0.jar:6.6.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:660) ~[elasticsearch-6.6.0.jar:6.6.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
[2019-02-17T01:49:37,200][WARN ][o.e.c.s.ClusterApplierService] [multi-master-1] cluster state applier task [apply cluster state (from master [master {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [95]])] took [30s] above the warn threshold of 30s
[2019-02-17T01:49:40,315][INFO ][o.e.c.s.ClusterApplierService] [multi-master-1] removed {{multi-master-0}{KZPjmKZtSf2LGV-IyvtfOg}{Es2wrbyiQdWz5CnT4V5wkA}{10.40.1.14}{10.40.1.14:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [96]])
[2019-02-17T01:49:55,945][WARN ][o.e.t.TransportService   ] [multi-master-1] Received response for a request that has timed out, sent [47893ms] ago, timed out [17870ms] ago, action [internal:discovery/zen/fd/master_ping], node [{multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [579]
[2019-02-17T01:50:03,784][INFO ][o.e.c.s.ClusterApplierService] [multi-master-1] added {{multi-master-0}{KZPjmKZtSf2LGV-IyvtfOg}{LqkyAGeTSUW2mtNmt49HIA}{10.40.1.15}{10.40.1.15:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [98]])

I figured Kubernetes might be killing the pods too abruptly, so I followed the instructions at https://www.elastic.co/guide/en/elasticsearch/reference/6.6/stopping-elasticsearch.html to stop Elasticsearch. Sure enough, if we kill the process from the elected master pod directly, the re-election will be quick!

Assuming mutli-master-2 is the new master:

$ kubectl exec multi-master-2 -- kill -SIGTERM 1

Notice how the API call from the second terminal only hangs for around 3 seconds this time!

Reading through the docs for the termination of pods (https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods), Kubernetes does in fact send a SIGTERM to the container, so I’m guessing deleting a pod does something more than just send a SIGTERM that Elasticsearch doesn’t like.

Issue Analytics

State:
Created 5 years ago
Comments:12 (6 by maintainers)

Top GitHub Comments

2reactions

DaveWHarveycommented, Mar 9, 2019

Here is what worked for me, based on the above suggestion. I mounted a script in the container that I run instead of the docker entrypoint which below. It still takes 30s to timeout the old master, but the new master seemed to be operational in 4 seconds of the old master shutdown.
Note: I had already made another fix that seems conceptually necessary to add a pre-shutdown hook on the master to delay the master termination a bit if there is not a quorum + 1 of master nodes. On a k8s rolling upgrade, a restarted master node need to be “ready” when it has opened port 9200, i.e. before it has been added to the cluster, and that allows the rolling upgrade to terminate the existing master before the new master eligible node has fully joined, and master election might not have a quorum.

if [[ -z $NODE_MASTER || “$NODE_MASTER” = “true” ]] ; then

# Run ES as a background task, and forward SIGTERM to it, then wait for it to exit
trap 'kill $(jobs -p)' SIGTERM

/usr/local/bin/docker-entrypoint.sh elasticsearch &

wait

# now keep the pod alive for 30s after ES dies so that we will refuse connections from
# the new master rather than them needing to time  out
sleep 30

else

exec /usr/local/bin/docker-entrypoint.sh elasticsearch

1reaction

Crazybuscommented, May 3, 2019

This has been merged into master but not yet released. I’m leaving this until it is released and that others have also confirmed that this solution resolves the issue properly.