question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NPE observed after scaling up Zookeeper on PKS cluster

See original GitHub issue

Curator throws NullPointerException on controller logs after scaling up Zookeeper on PKS cluster

In a PKS cluster, started moderate IO with Pravega Benchmark tool and tried scale up of Zookeeper from 3 to 5 using kubectl edit zk nautilus-pravega-zookeeper. Observed ERROR o.a.c.f.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up java.lang.NullPointerException: null NullPointerException on controller log file and after that no further logging happened.

Steps Followed

  1. Created PKS cluster and deployed Pravega Build : 0.4.0-rc1 by using PravegaOperator method
  2. Started moderate IO with Pravega Benchmark tool
  3. Used kubectl edit PravegaCluster <cluster-name> to scale up pravega components, Segmentstore from 3 to 10 and Bookies from 3 to 10
  4. Now, tried to scale up of Zookeeper from 3 to 5 using kubectl edit zk nautilus-pravega-zookeeper
  5. Observed ERROR o.a.c.f.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up java.lang.NullPointerException: null

Log Snip

2018-11-14 09:17:29,614 83780488 [ControllerServiceMain-EventThread] ERROR o.a.c.f.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up
java.lang.NullPointerException: null
        at org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:179)
        at org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:200)
        at org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50)
        at org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:144)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:852)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:629)
        at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
        at org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:587)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:499)
2018-11-14 10:01:10,316 86401190 [pool-6-thread-1] INFO  i.p.c.s.stream.ZKGarbageCollector - Acquired guard, starting GC iteration for completedTxnG 

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
deenavcommented, Feb 4, 2019

@adrianmo Yes, I encountered this error while scaling up ZK. I will check and update whether the problem persists with the latest ZK operator or not.

0reactions
RaulGraciacommented, Jan 31, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

ZooKeeper Observers
Observers : Scaling ZooKeeper Without Hurting Write Performance. Although ZooKeeper performs very well by having clients connect directly to ...
Read more >
Zookeeper instances fail to start on second scale up · Issue #94
This issue occurs when zk server is "scaled up", "scaled down" and then "scaled up" again and a newly starting pod points to...
Read more >
Chapter 6. Known issues Red Hat AMQ 7.6
There is a known issue related to scaling ZooKeeper up or down. Scaling ZooKeeper up means adding servers to a ZooKeeper cluster. Scaling...
Read more >
Running ZooKeeper, A Distributed System Coordinator
This tutorial demonstrates running Apache Zookeeper on Kubernetes using StatefulSets, PodDisruptionBudgets, and PodAntiAffinity.
Read more >
Running ZooKeeper in Production - Confluent Documentation
Apache Kafka uses ZooKeeper to store persistent cluster metadata and is a critical component of the Confluent Platform deployment.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found