Cleanup exception handling "Controller Service Main thread exited exceptionally"
See original GitHub issueObserving Controller failure in running the service starter after experiencing restarts, throwing IllegalStateException
because the service ControllerServiceStarter
was expected to be “TERMINATED” instead it just failed which in turn didn’t allow the controller to restart its service gracefully.
While running IO work load for 1+ days (writer: 1, reader: 1, size: 100 bytes, events/sec: 1, segments: 1000, streams: 2) experienced both controllers restart in Pravega-Operator deployment with 2 controllers, 3 Zookeepers, 4 Bookies and 3 Segment stores.
Environment details: PKS / K8 with medium cluster:
1 master nodes @ large.cpu (4 CPU, 4 GB Ram, 16 GB Disk)
3 worker nodes @ xlarge.cpu(4 cpu, 16 GB Ram, 32 GB Disk)
Tier-1 storage is from VSAN datastore
Tier-2 storage curved on NFS Client provisioner using Isilon as backend
Pravega details:
Zookeeper Operator : pravega/zookeeper-operator:0.2.1
Pravega Operator: pravega/pravega-operator:0.3.2
Found Error messages and Exceptions like following: -
2019-03-18 17:34:27,057 21818849 [Delegate] INFO i.p.c.e.impl.EventProcessorCell - Event processor STARTUP EventProcessor[commitStreamReaders:0], state=RUNNING
2019-03-18 17:34:27,056 21818848 [ControllerServiceMain] ERROR i.p.c.server.ControllerServiceMain - Controller Service Main thread exited exceptionally
java.lang.IllegalStateException: Expected the service ControllerServiceStarter [FAILED] to be TERMINATED, but the service has FAILED
at com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:366)
at com.google.common.util.concurrent.AbstractService.awaitTerminated(AbstractService.java:329)
at com.google.common.util.concurrent.AbstractIdleService.awaitTerminated(AbstractIdleService.java:177)
at io.pravega.controller.server.ControllerServiceMain.run(ControllerServiceMain.java:146)
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:66)
at com.google.common.util.concurrent.Callables$4.run(Callables.java:119)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Expected the service ControllerEventProcessors [FAILED] to be TERMINATED, but the service has FAILED
at com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:366)
at com.google.common.util.concurrent.AbstractService.awaitTerminated(AbstractService.java:329)
at com.google.common.util.concurrent.AbstractIdleService.awaitTerminated(AbstractIdleService.java:177)
at io.pravega.controller.server.ControllerServiceStarter.shutDown(ControllerServiceStarter.java:335)
at com.google.common.util.concurrent.AbstractIdleService$DelegateService$2.run(AbstractIdleService.java:79)
... 2 common frames omitted
Caused by: java.lang.IllegalStateException: Expected the service EventProcessorGroup[commitStreamReaders] to be TERMINATED, but the service has FAILED
at com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:366)
at com.google.common.util.concurrent.AbstractService.awaitTerminated(AbstractService.java:329)
at com.google.common.util.concurrent.AbstractIdleService.awaitTerminated(AbstractIdleService.java:177)
at io.pravega.controller.server.eventProcessor.ControllerEventProcessors.stopEventProcessors(ControllerEventProcessors.java:424)
at io.pravega.controller.server.eventProcessor.ControllerEventProcessors.shutDown(ControllerEventProcessors.java:156)
... 3 common frames omitted
Caused by: io.pravega.controller.store.host.HostStoreException: Failed to fetch segment container map from zookeeper
at io.pravega.controller.store.host.ZKHostStore.getCurrentHostMap(ZKHostStore.java:81)
at io.pravega.controller.store.host.ZKHostStore.getHostForContainer(ZKHostStore.java:106)
at io.pravega.controller.store.host.ZKHostStore.getHostForSegment(ZKHostStore.java:125)
at io.pravega.controller.server.SegmentHelper.getSegmentUri(SegmentHelper.java:70)
at io.pravega.controller.server.ControllerService.getURI(ControllerService.java:275)
at io.pravega.controller.server.eventProcessor.LocalController.getEndpointForSegment(LocalController.java:387)
at io.pravega.client.netty.impl.RawClient.<init>(RawClient.java:81)
at io.pravega.client.segment.impl.SegmentMetadataClientImpl.getConnection(SegmentMetadataClientImpl.java:94)
at io.pravega.client.segment.impl.SegmentMetadataClientImpl.getPropertyAsync(SegmentMetadataClientImpl.java:128)
at io.pravega.client.segment.impl.SegmentMetadataClientImpl.lambda$fetchProperty$6(SegmentMetadataClientImpl.java:175)
at io.pravega.common.concurrent.Futures.lambda$delayedFuture$21(Futures.java:536)
at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
at io.pravega.common.concurrent.Futures.delayedFuture(Futures.java:536)
at io.pravega.common.util.Retry$RetryAndThrowBase.lambda$runAsync$7(Retry.java:237)
at io.pravega.common.concurrent.Futures$Loop.call(Futures.java:712)
at io.pravega.common.concurrent.Futures$Loop.call(Futures.java:681)
at io.pravega.common.concurrent.Futures.runOrFail(Futures.java:572)
at io.pravega.common.concurrent.Futures$Loop.run(Futures.java:725)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 common frames omitted
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /pravega/pravega/cluster/segmentContainerHostMapping
at org.apache.zookeeper.KeeperException.create(KeeperException.java:131)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2019)
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:327)
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:316)
at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:313)
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:304)
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:35)
at io.pravega.controller.store.host.ZKHostStore.getCurrentHostMap(ZKHostStore.java:79)
... 25 common frames omitted
2019-03-18 17:34:27,057 21818849 [Delegate] WARN i.p.c.e.impl.EventProcessorCell - Restarting event processor: EventProcessor[commitStreamReaders:0] due to exception: {}
java.lang.IllegalStateException: Reader is closed
at com.google.common.base.Preconditions.checkState(Preconditions.java:507)
at io.pravega.client.stream.impl.EventStreamReaderImpl.readNextEvent(EventStreamReaderImpl.java:86)
at io.pravega.controller.eventProcessor.impl.EventProcessorCell$Delegate.run(EventProcessorCell.java:103)
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:66)
at com.google.common.util.concurrent.Callables$4.run(Callables.java:119)
at java.lang.Thread.run(Thread.java:748)
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Exceptions and Exception Handling | Microsoft Learn
Learn about exceptions and exception handling. These C# features help deal with unexpected or exceptional situations that happen when a ...
Read more >Handle ZK-related exceptions from HostStore when shutting ...
Thread.run(Thread.java:748) Caused by: org.apache.zookeeper. ... IllegalStateException: Expected the service ControllerServiceMain [FAILED] ...
Read more >Main thread exception handler in Groovy script - Stack Overflow
Having a problem with Groovy, I need to do some clean-up before exiting if uncaught exception was thrown in script, but can't find...
Read more >Complete Guide to Exception Handling in Spring Boot
This article showcases various ways to handle exceptions in a Spring Boot Application.
Read more >9 Best Practices to Handle Java Exceptions - Stackify
Handling Java exceptions isn't easy, especially for beginners. Read this post to understand exceptions and best practices for using them.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks @SomeshJoshi19, closing this issue as it cannot be reproduced in the current
0.5
version.@shiveshr I have performed multiple restarts on Controller with latest pravega version
0.5.0-2291.3ccff63
but did not see the Exception getting reproduced again in the Controller.