question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hazelcast issue after enabling istio and injected envoy

See original GitHub issue

Hi

i’m having issue with hazelcast after enabling istio and i wonder how can i address this.

i have K8s cluster and i’ve recently installed istio. when trying to add envoy to deployment with hazelcast i have a wierd issue where i have many coinnections error during rolling upgrade. i should mention eventually the deployment is OK but this errors indicate something is wrong.

i’ve noticed that without Envoy when i’m doing rolling upgrade to a deployment i see the following:

[10.16.17.72]:5701 [dev] [4.0.1] Initialized new cluster connection between /10.16.17.72:45025 and /10.16.5.8:5701
[10.16.5.8]:5701 [dev] [4.0.1] Initialized new cluster connection between /10.16.5.8:5701 and /10.16.17.72:45025

[10.16.17.72]:5701 [dev] [4.0.1] Connection[id=1, /10.16.17.72:45025->/10.16.5.8:5701, qualifier=null, endpoint=[10.16.5.8]:5701, alive=false, connectionType=MEMBER] closed. Reason: Connection closed by the other side
[10.16.17.72]:5701 [dev] [4.0.1] Could not connect to: /10.16.5.8:5701. Reason: SocketException[Connection refused to address /10.16.5.8:5701]
.......
[10.16.17.72]:5701 [dev] [4.0.1] Removing connection to endpoint [10.16.5.8]:5701 Cause => java.net.SocketException {Connection refused to address /10.16.5.8:5701}, Error-Count: 5
[10.16.17.72]:5701 [dev] [4.0.1] Member [10.16.5.8]:5701 - 945ec2c8-fc56-4624-aab3-de9823d4886a is suspected to be dead for reason: No connection

what happens here is:

  • new pod starting and joining the cluster.
  • connection initialize between old-pod:5701 to new-pod:xxx (2 directions)
  • new pod complains it cannot reach old pod (connectionType=MEMBER) , and after 5 attempts consider it as dead and remove it from the cluster
  • old pod removed once rolling upgrade completes.

now, when i’m doing the same while injecting Envoy so i have 2 containers in this pod deployment, i’ve noticed the following:

[10.16.3.244]:5701 [dev] [4.0.1] Initialized new cluster connection between /10.16.3.244:5701 and **/127.0.0.6:48287**
[10.16.5.16]:5701 [dev] [4.0.1] Initialized new cluster connection between /10.16.5.16:59827 and /10.16.3.244:5701

[10.16.5.16]:5701 [dev] [4.0.1] Connection[id=1, /10.16.5.16:59827->/10.16.3.244:5701, qualifier=null, endpoint=[10.16.3.244]:5701, alive=false, connectionType=MEMBER] closed. Reason: Connection closed by the other side

but then i get million of messages like the following:

[10.16.5.16]:5701 [dev] [4.0.1] Connection[id=2, /10.16.5.16:33659->/10.16.3.244:5701, qualifier=null, endpoint=[10.16.3.244]:5701, alive=false, connectionType=NONE] closed. Reason: Connection closed by the other side

the first ‘Connection closed’ message was MEMBER type and was on the same mentioned connection we have on the initializtion message (10.16.5.16:59827 --> 10.16.3.244:5701) but the rest of the messages are from random ports on 10.16.5.16 to the old pod. i assume the reason for this is the init message the indicate Initialized new cluster connection between /10.16.3.244:5701 and **/127.0.0.6:48287** it configured the connection to localhost instead of to 10.16.5.16:59827.

rolling upgrade completes the same but the log is full with million of messages from that kind.

how can i prevent this ? how to make sure hazelcast mark the connection properly between the pods ip and not the loopback?

Thanks Chen

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:19

github_iconTop GitHub Comments

2reactions
alparslanavcicommented, Mar 2, 2021

@lechen26, I checked Hazelcast internals and extracted a bug. Please follow it up here: https://github.com/hazelcast/hazelcast/issues/18320

The issue doesn’t cause any problems on Hazelcast’s cluster formation or regular operations as I tested, but it needs to be fixed since the member opens unnecessary connections when running behind a proxy.

Thanks for your contribution, and also patience.

1reaction
alparslanavcicommented, Jun 22, 2021

I can verify that this issue is fixed by https://github.com/hazelcast/hazelcast/issues/18320.

However, when using Hazelcast behind a proxy, the users can still face some connection errors on member disconnection. This is caused by the proxy and Hazelcast containers’ startup latencies. The proxy (Envoy) continues to listen to the port and accept TCP connections even if the Hazelcast instance is not ready, which makes other Hazelcast members to connect & disconnect momentarily until the Hazelcast instance is up and running. This is expected, and not affecting Hazelcast’s lifecycle directly.

Closing this one as the main issue is fixed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Hazelcast issues after injecting istio envoy - Stack Overflow
i'm having issue with hazelcast after enabling istio and i wonder how can i address this. i have K8s cluster and i've recently...
Read more >
Hazelcast with Istio Service Mesh
Summary. This tutorial demonstrates how to use Hazelcast Embedded and client/server topology in an mTLS-enabled Istio environment with Automatic Sidecar  ...
Read more >
hazelcast/hazelcast - Gitter
My guess it's an issue with the (istio) networking, something like the envoy proxy at first does not resolve IP addresses correctly, and...
Read more >
Sidecar Injection Problems - Istio
The sidecar model assumes that the iptables changes required for Envoy to intercept traffic are within the pod. For pods on the host...
Read more >
Implementing Microservicilites with Istio - InfoQ
The Envoy proxy sidecar container implements the following features: ... kubectl label namespace default istio-injection=enabled.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found