[BUG] Ignite Nodes disconnecting every now and then
See original GitHub issueDescribe the bug We were noticing our Ignite nodes disconnecting occasionally on the Dev cluster. So far twice on May 19, 2020.
Servers going from 2->1:
[23:03:23] Data Regions Configured:
[23:03:23] ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[23:03:36] Topology snapshot [ver=15376, servers=1, clients=17, CPUs=24, offheap=13.0GB, heap=22.0GB]
[23:03:36] ^-- Node [id=D53D63B2-3AD3-47E9-B4CE-3E7B9EB2BB79, clusterState=ACTIVE]
[23:03:36] Data Regions Configured:
[23:03:36] ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[23:03:48] Topology snapshot [ver=15377, servers=2, clients=17, CPUs=32, offheap=25.0GB, heap=23.0GB]
[23:03:48] ^-- Node [id=D53D63B2-3AD3-47E9-B4CE-3E7B9EB2BB79, clusterState=ACTIVE]
[23:03:48] Data Regions Configured:
[23:03:48] ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[23:03:48] Topology snapshot [ver=15378, servers=1, clients=17, CPUs=24, offheap=13.0GB, heap=22.0GB]
[23:03:48] ^-- Node [id=D53D63B2-3AD3-47E9-B4CE-3E7B9EB2BB79, clusterState=ACTIVE]
[23:03:48] Data Regions Configured:
[23:03:48] ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[23:03:51] Topology snapshot [ver=15379, servers=2, clients=17, CPUs=32, offheap=25.0GB, heap=23.0GB]
[23:03:51] ^-- Node [id=D53D63B2-3AD3-47E9-B4CE-3E7B9EB2BB79, clusterState=ACTIVE]
[23:03:51] Data Regions Configured:
[23:03:51] ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[23:04:07] Topology snapshot [ver=15380, servers=2, clients=16, CPUs=32, offheap=25.0GB, heap=23.0GB]
[23:04:07] ^-- Node [id=D53D63B2-3AD3-47E9-B4CE-3E7B9EB2BB79, clusterState=ACTIVE]
[23:04:07] Data Regions Configured:
[23:04:07] ^-- default [initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false]
[11:59:15,728][SEVERE][tcp-disco-ip-finder-cleaner-#5][TcpDiscoverySpi] Failed to clean IP finder up.
class org.apache.ignite.spi.IgniteSpiException: Failed to retrieve Ignite pods IP addresses.
at org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder.getRegisteredAddresses(TcpDiscoveryKubernetesIpFinder.java:172)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.registeredAddresses(TcpDiscoverySpi.java:1828)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.cleanIpFinder(ServerImpl.java:1938)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$IpFinderCleaner.body(ServerImpl.java:1913)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: java.net.UnknownHostException: kubernetes.default.svc.cluster.local
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:673)
at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
at org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder.getRegisteredAddresses(TcpDiscoveryKubernetesIpFinder.java:153)
To Reproduce Steps to reproduce the behavior:
- execute epicli init … (with params)
- edit config file, there should be PostgreSQL and Ignite.
- execute epicli apply …
OS (please complete the following information):
- OS: ???
Cloud Environment (please complete the following information):
- Cloud Provider MS Azure
Additional context Add any other context about the problem here. Issue based on Jira request: EP-108.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Ignite Nodes disconnects after some time - Stack Overflow
Whenever the ignite nodes goes on idle they gets disconnected and after that time if i try to connect my first request gets...
Read more >Client gets disconnected, bug with carbons? OF3.9.3 - #8 by CSH ...
It seems like this issue occurs only, if more than one message carbon is created, in which case the same message gets added...
Read more >Baseline Topology | Ignite Documentation
The baseline topology is a set of nodes meant to hold data. ... you add 2 more nodes, the rebalancing process re-distributes the...
Read more >Expected Behaviors when Failures Occur - PTC Support
Restart all Platform nodes. If ignite is not restarted, bind maps and other data stored in ignite will not be correct and cause...
Read more >apacheignite/ignite - Gitter
Is there is some way to put entry in IgniteCache ignoring any affinity ... one: We regulary get crashes on client nodes after...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is request is external - I do not have an information about cni used - I assume flannel (most of usacases is flannel). The problem may be related to K8s service discovery used in Ignite. @pyrkamarcin configured Ignite with Zookeeper and it worked without problems.
@atsikham Could you please create a new gh issue to implement such functionality?