NPE observed after scaling up Zookeeper on PKS cluster
See original GitHub issueCurator throws NullPointerException on controller logs after scaling up Zookeeper on PKS cluster
In a PKS cluster, started moderate IO with Pravega Benchmark tool and tried scale up of Zookeeper from 3 to 5 using kubectl edit zk nautilus-pravega-zookeeper
. Observed ERROR o.a.c.f.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up java.lang.NullPointerException: null
NullPointerException on controller log file and after that no further logging happened.
Steps Followed
- Created PKS cluster and deployed
Pravega Build : 0.4.0-rc1
by usingPravegaOperator
method - Started moderate IO with Pravega Benchmark tool
- Used
kubectl edit PravegaCluster <cluster-name>
to scale up pravega components, Segmentstore from 3 to 10 and Bookies from 3 to 10 - Now, tried to scale up of Zookeeper from 3 to 5 using
kubectl edit zk nautilus-pravega-zookeeper
- Observed
ERROR o.a.c.f.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up java.lang.NullPointerException: null
Log Snip
2018-11-14 09:17:29,614 83780488 [ControllerServiceMain-EventThread] ERROR o.a.c.f.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up
java.lang.NullPointerException: null
at org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:179)
at org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:200)
at org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50)
at org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:144)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:852)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:629)
at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
at org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:587)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:499)
2018-11-14 10:01:10,316 86401190 [pool-6-thread-1] INFO i.p.c.s.stream.ZKGarbageCollector - Acquired guard, starting GC iteration for completedTxnG
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
ZooKeeper Observers
Observers : Scaling ZooKeeper Without Hurting Write Performance. Although ZooKeeper performs very well by having clients connect directly to ...
Read more >Zookeeper instances fail to start on second scale up · Issue #94
This issue occurs when zk server is "scaled up", "scaled down" and then "scaled up" again and a newly starting pod points to...
Read more >Chapter 6. Known issues Red Hat AMQ 7.6
There is a known issue related to scaling ZooKeeper up or down. Scaling ZooKeeper up means adding servers to a ZooKeeper cluster. Scaling...
Read more >Running ZooKeeper, A Distributed System Coordinator
This tutorial demonstrates running Apache Zookeeper on Kubernetes using StatefulSets, PodDisruptionBudgets, and PodAntiAffinity.
Read more >Running ZooKeeper in Production - Confluent Documentation
Apache Kafka uses ZooKeeper to store persistent cluster metadata and is a critical component of the Confluent Platform deployment.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@adrianmo Yes, I encountered this error while scaling up ZK. I will check and update whether the problem persists with the latest ZK operator or not.
This should go to the https://github.com/pravega/zookeeper-operator repo.