question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cleanup exception handling "Controller Service Main thread exited exceptionally"

See original GitHub issue

Observing Controller failure in running the service starter after experiencing restarts, throwing IllegalStateException because the service ControllerServiceStarter was expected to be “TERMINATED” instead it just failed which in turn didn’t allow the controller to restart its service gracefully.

While running IO work load for 1+ days (writer: 1, reader: 1, size: 100 bytes, events/sec: 1, segments: 1000, streams: 2) experienced both controllers restart in Pravega-Operator deployment with 2 controllers, 3 Zookeepers, 4 Bookies and 3 Segment stores.

Environment details: PKS / K8 with medium cluster:

1 master nodes @ large.cpu (4 CPU, 4 GB Ram, 16 GB Disk)
3 worker nodes @ xlarge.cpu(4 cpu, 16 GB Ram, 32 GB Disk)
Tier-1 storage is from VSAN datastore
Tier-2 storage curved on NFS Client provisioner using Isilon as backend

Pravega details:

Zookeeper Operator : pravega/zookeeper-operator:0.2.1
Pravega Operator: pravega/pravega-operator:0.3.2

Found Error messages and Exceptions like following: -

2019-03-18 17:34:27,057 21818849 [Delegate] INFO  i.p.c.e.impl.EventProcessorCell - Event processor STARTUP EventProcessor[commitStreamReaders:0], state=RUNNING
2019-03-18 17:34:27,056 21818848 [ControllerServiceMain] ERROR i.p.c.server.ControllerServiceMain - Controller Service Main thread exited exceptionally
java.lang.IllegalStateException: Expected the service ControllerServiceStarter [FAILED] to be TERMINATED, but the service has FAILED
        at com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:366)
        at com.google.common.util.concurrent.AbstractService.awaitTerminated(AbstractService.java:329)
        at com.google.common.util.concurrent.AbstractIdleService.awaitTerminated(AbstractIdleService.java:177)
        at io.pravega.controller.server.ControllerServiceMain.run(ControllerServiceMain.java:146)
        at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:66)
        at com.google.common.util.concurrent.Callables$4.run(Callables.java:119)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Expected the service ControllerEventProcessors [FAILED] to be TERMINATED, but the service has FAILED
        at com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:366)
        at com.google.common.util.concurrent.AbstractService.awaitTerminated(AbstractService.java:329)
        at com.google.common.util.concurrent.AbstractIdleService.awaitTerminated(AbstractIdleService.java:177)
        at io.pravega.controller.server.ControllerServiceStarter.shutDown(ControllerServiceStarter.java:335)
        at com.google.common.util.concurrent.AbstractIdleService$DelegateService$2.run(AbstractIdleService.java:79)
        ... 2 common frames omitted
Caused by: java.lang.IllegalStateException: Expected the service EventProcessorGroup[commitStreamReaders] to be TERMINATED, but the service has FAILED
        at com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:366)
        at com.google.common.util.concurrent.AbstractService.awaitTerminated(AbstractService.java:329)
        at com.google.common.util.concurrent.AbstractIdleService.awaitTerminated(AbstractIdleService.java:177)
        at io.pravega.controller.server.eventProcessor.ControllerEventProcessors.stopEventProcessors(ControllerEventProcessors.java:424)
        at io.pravega.controller.server.eventProcessor.ControllerEventProcessors.shutDown(ControllerEventProcessors.java:156)
        ... 3 common frames omitted
Caused by: io.pravega.controller.store.host.HostStoreException: Failed to fetch segment container map from zookeeper
        at io.pravega.controller.store.host.ZKHostStore.getCurrentHostMap(ZKHostStore.java:81)
        at io.pravega.controller.store.host.ZKHostStore.getHostForContainer(ZKHostStore.java:106)
        at io.pravega.controller.store.host.ZKHostStore.getHostForSegment(ZKHostStore.java:125)
        at io.pravega.controller.server.SegmentHelper.getSegmentUri(SegmentHelper.java:70)
        at io.pravega.controller.server.ControllerService.getURI(ControllerService.java:275)
        at io.pravega.controller.server.eventProcessor.LocalController.getEndpointForSegment(LocalController.java:387)
        at io.pravega.client.netty.impl.RawClient.<init>(RawClient.java:81)
        at io.pravega.client.segment.impl.SegmentMetadataClientImpl.getConnection(SegmentMetadataClientImpl.java:94)
        at io.pravega.client.segment.impl.SegmentMetadataClientImpl.getPropertyAsync(SegmentMetadataClientImpl.java:128)
        at io.pravega.client.segment.impl.SegmentMetadataClientImpl.lambda$fetchProperty$6(SegmentMetadataClientImpl.java:175)
        at io.pravega.common.concurrent.Futures.lambda$delayedFuture$21(Futures.java:536)
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
        at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
        at io.pravega.common.concurrent.Futures.delayedFuture(Futures.java:536)
        at io.pravega.common.util.Retry$RetryAndThrowBase.lambda$runAsync$7(Retry.java:237)
        at io.pravega.common.concurrent.Futures$Loop.call(Futures.java:712)
        at io.pravega.common.concurrent.Futures$Loop.call(Futures.java:681)
        at io.pravega.common.concurrent.Futures.runOrFail(Futures.java:572)
        at io.pravega.common.concurrent.Futures$Loop.run(Futures.java:725)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 common frames omitted
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /pravega/pravega/cluster/segmentContainerHostMapping
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:131)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2019)
        at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:327)
        at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:316)
        at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
        at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:313)
        at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:304)
        at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:35)
        at io.pravega.controller.store.host.ZKHostStore.getCurrentHostMap(ZKHostStore.java:79)
        ... 25 common frames omitted
2019-03-18 17:34:27,057 21818849 [Delegate] WARN  i.p.c.e.impl.EventProcessorCell - Restarting event processor: EventProcessor[commitStreamReaders:0] due to exception: {}
java.lang.IllegalStateException: Reader is closed
        at com.google.common.base.Preconditions.checkState(Preconditions.java:507)
        at io.pravega.client.stream.impl.EventStreamReaderImpl.readNextEvent(EventStreamReaderImpl.java:86)
        at io.pravega.controller.eventProcessor.impl.EventProcessorCell$Delegate.run(EventProcessorCell.java:103)
        at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:66)
        at com.google.common.util.concurrent.Callables$4.run(Callables.java:119)
        at java.lang.Thread.run(Thread.java:748)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
RaulGraciacommented, Jul 5, 2019

Thanks @SomeshJoshi19, closing this issue as it cannot be reproduced in the current 0.5 version.

0reactions
SomeshJoshi19commented, Jul 5, 2019

@shiveshr I have performed multiple restarts on Controller with latest pravega version 0.5.0-2291.3ccff63 but did not see the Exception getting reproduced again in the Controller.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Exceptions and Exception Handling | Microsoft Learn
Learn about exceptions and exception handling. These C# features help deal with unexpected or exceptional situations that happen when a ...
Read more >
Handle ZK-related exceptions from HostStore when shutting ...
Thread.run(Thread.java:748) Caused by: org.apache.zookeeper. ... IllegalStateException: Expected the service ControllerServiceMain [FAILED] ...
Read more >
Main thread exception handler in Groovy script - Stack Overflow
Having a problem with Groovy, I need to do some clean-up before exiting if uncaught exception was thrown in script, but can't find...
Read more >
Complete Guide to Exception Handling in Spring Boot
This article showcases various ways to handle exceptions in a Spring Boot Application.
Read more >
9 Best Practices to Handle Java Exceptions - Stackify
Handling Java exceptions isn't easy, especially for beginners. Read this post to understand exceptions and best practices for using them.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found