Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Ignite Nodes disconnecting every now and then

See original GitHub issue

Describe the bug We were noticing our Ignite nodes disconnecting occasionally on the Dev cluster. So far twice on May 19, 2020.

Servers going from 2->1:

[23:03:23] Data Regions Configured:
[23:03:23]   ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[23:03:36] Topology snapshot [ver=15376, servers=1, clients=17, CPUs=24, offheap=13.0GB, heap=22.0GB]
[23:03:36]   ^-- Node [id=D53D63B2-3AD3-47E9-B4CE-3E7B9EB2BB79, clusterState=ACTIVE]
[23:03:36] Data Regions Configured:
[23:03:36]   ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[23:03:48] Topology snapshot [ver=15377, servers=2, clients=17, CPUs=32, offheap=25.0GB, heap=23.0GB]
[23:03:48]   ^-- Node [id=D53D63B2-3AD3-47E9-B4CE-3E7B9EB2BB79, clusterState=ACTIVE]
[23:03:48] Data Regions Configured:
[23:03:48]   ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[23:03:48] Topology snapshot [ver=15378, servers=1, clients=17, CPUs=24, offheap=13.0GB, heap=22.0GB]
[23:03:48]   ^-- Node [id=D53D63B2-3AD3-47E9-B4CE-3E7B9EB2BB79, clusterState=ACTIVE]
[23:03:48] Data Regions Configured:
[23:03:48]   ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[23:03:51] Topology snapshot [ver=15379, servers=2, clients=17, CPUs=32, offheap=25.0GB, heap=23.0GB]
[23:03:51]   ^-- Node [id=D53D63B2-3AD3-47E9-B4CE-3E7B9EB2BB79, clusterState=ACTIVE]
[23:03:51] Data Regions Configured:
[23:03:51]   ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[23:04:07] Topology snapshot [ver=15380, servers=2, clients=16, CPUs=32, offheap=25.0GB, heap=23.0GB]
[23:04:07]   ^-- Node [id=D53D63B2-3AD3-47E9-B4CE-3E7B9EB2BB79, clusterState=ACTIVE]
[23:04:07] Data Regions Configured:
[23:04:07]   ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]

[11:59:15,728][SEVERE][tcp-disco-ip-finder-cleaner-#5][TcpDiscoverySpi] Failed to clean IP finder up.
class org.apache.ignite.spi.IgniteSpiException: Failed to retrieve Ignite pods IP addresses.
at org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder.getRegisteredAddresses(TcpDiscoveryKubernetesIpFinder.java:172)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.registeredAddresses(TcpDiscoverySpi.java:1828)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.cleanIpFinder(ServerImpl.java:1938)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.body(ServerImpl.java:1913)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: java.net.UnknownHostException: kubernetes.default.svc.cluster.local
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:673)
at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
at org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder.getRegisteredAddresses(TcpDiscoveryKubernetesIpFinder.java:153)

To Reproduce Steps to reproduce the behavior:

execute epicli init … (with params)
edit config file, there should be PostgreSQL and Ignite.
execute epicli apply …

OS (please complete the following information):

OS: ???

Cloud Environment (please complete the following information):

Cloud Provider MS Azure

Additional context Add any other context about the problem here. Issue based on Jira request: EP-108.

Issue Analytics

State:
Created 3 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

toszocommented, Dec 8, 2020

This is request is external - I do not have an information about cni used - I assume flannel (most of usacases is flannel). The problem may be related to K8s service discovery used in Ignite. @pyrkamarcin configured Ignite with Zookeeper and it worked without problems.

0reactions

rafzeicommented, Apr 26, 2021

@atsikham Could you please create a new gh issue to implement such functionality?

Top Results From Across the Web

Ignite Nodes disconnects after some time - Stack Overflow

Whenever the ignite nodes goes on idle they gets disconnected and after that time if i try to connect my first request gets...

Client gets disconnected, bug with carbons? OF3.9.3 - #8 by CSH ...

It seems like this issue occurs only, if more than one message carbon is created, in which case the same message gets added...

Baseline Topology | Ignite Documentation

The baseline topology is a set of nodes meant to hold data. ... you add 2 more nodes, the rebalancing process re-distributes the...

Expected Behaviors when Failures Occur - PTC Support

Restart all Platform nodes. If ignite is not restarted, bind maps and other data stored in ignite will not be correct and cause...

apacheignite/ignite - Gitter

Is there is some way to put entry in IgniteCache ignoring any affinity ... one: We regulary get crashes on client nodes after...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[BUG] Ignite Nodes disconnecting every now and then

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[BUG] Kubectl client outside of HA/multi-master Epiphany cluster fails to connect to server with invalid certificate

[BUG] Epicli upgrade issue - the process hangs for several hours on the task kubeadm upgrade apply