[BUG] Delayed event hub partition claiming of EventProcessorClient
See original GitHub issueDescribe the bug
We are using Azure Event Hub Service. We tested new version com.azure:azure-messaging-eventhubs-checkpointstore-blob:1.16.0
and are observing delayed partition claiming, which in the end causes event processing delay.
Exception or Stack Trace
com.azure.storage.blob.models.BlobStorageException: Status code 412, \"<?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>ConditionNotMet</Code><Message>The condition specified using HTTP conditional header(s) is not met.
RequestId:......
Time:2022-10-21T15:45:32.6413117Z</Message></Error>\"
at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(Unknown Source)
at com.azure.core.implementation.http.rest.ResponseExceptionConstructorCache.invoke(ResponseExceptionConstructorCache.java:56)
at com.azure.core.implementation.http.rest.RestProxyBase.instantiateUnexpectedException(RestProxyBase.java:367)
at com.azure.core.implementation.http.rest.AsyncRestProxy.lambda$ensureExpectedStatus$1(AsyncRestProxy.java:115)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:113)
... 77 common frames omitted
Wrapped by: java.lang.IllegalStateException: Error while claiming checkpoints
at com.azure.messaging.eventhubs.PartitionBasedLoadBalancer.lambda$claimOwnership$24(PartitionBasedLoadBalancer.java:478)
at reactor.core.publisher.LambdaMonoSubscriber.doError(LambdaMonoSubscriber.java:155)
at reactor.core.publisher.LambdaMonoSubscriber.onError(LambdaMonoSubscriber.java:150)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:172)
at reactor.core.publisher.MonoCollectList$MonoCollectListSubscriber.onError(MonoCollectList.java:114)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.checkTerminated(FluxFlatMap.java:842)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.drainLoop(FluxFlatMap.java:608)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.drain(FluxFlatMap.java:588)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.innerError(FluxFlatMap.java:863)
at reactor.core.publisher.FluxFlatMap$FlatMapInner.onError(FluxFlatMap.java:990)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.checkTerminated(FluxFlatMap.java:842)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.drainLoop(FluxFlatMap.java:608)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.drain(FluxFlatMap.java:588)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onError(FluxFlatMap.java:451)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:416)
at reactor.core.publisher.DrainUtils.postCompleteDrain(DrainUtils.java:136)
at reactor.core.publisher.DrainUtils.postComplete(DrainUtils.java:187)
at reactor.core.publisher.FluxMapSignal$FluxMapSignalSubscriber.onError(FluxMapSignal.java:192)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onError(FluxMapFuseable.java:142)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:172)
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onError(FluxContextWrite.java:121)
at reactor.core.publisher.FluxDoOnEach$DoOnEachSubscriber.onError(FluxDoOnEach.java:195)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:192)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:259)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:142)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:74)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2398)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.request(FluxMapFuseable.java:171)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2194)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onSubscribe(Operators.java:2068)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onSubscribe(FluxMapFuseable.java:96)
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:157)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:137)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:137)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:151)
at reactor.core.publisher.FluxDelaySubscription$DelaySubscriptionMainSubscriber.onNext(FluxDelaySubscription.java:189)
at reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
at reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.onNext(FluxTimeout.java:180)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:151)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.complete(MonoIgnoreThen.java:292)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onNext(MonoIgnoreThen.java:187)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:151)
at reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onNext(FluxRetryWhen.java:174)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
at reactor.core.publisher.Operators$MonoInnerProducerBase.complete(Operators.java:2664)
at reactor.core.publisher.MonoSingle$SingleSubscriber.onComplete(MonoSingle.java:180)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onComplete(MonoFlatMapMany.java:260)
at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:144)
at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onComplete(FluxDoFinally.java:128)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onComplete(FluxMapFuseable.java:152)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1817)
at reactor.core.publisher.MonoCollect$CollectSubscriber.onComplete(MonoCollect.java:160)
at reactor.core.publisher.FluxHandle$HandleSubscriber.onComplete(FluxHandle.java:220)
at reactor.core.publisher.FluxMap$MapConditionalSubscriber.onComplete(FluxMap.java:275)
at reactor.netty.channel.FluxReceive.onInboundComplete(FluxReceive.java:400)
at reactor.netty.channel.ChannelOperations.onInboundComplete(ChannelOperations.java:419)
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:473)
at reactor.netty.http.client.HttpClientOperations.onInboundNext(HttpClientOperations.java:703)
at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:93)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:336)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:308)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373)
at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1247)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1287)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:519)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:458)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:280)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
To Reproduce Steps to reproduce the behavior:
- Use EventProcessorClient for partition processing (32 partitions)
- Define Greedy loadbalancing
- Produce on all partitions an incoming event
- Start the EventProcessorClient and observe how some partitions are starting to process events several minutes delayed ( up to 6-16 minutes )
- After all partitions are claimed and started processing, the exceptions are stopping
Code Snippet
Expected behavior
- EventProcessorClients should directly claim all partitions without delay (especially with GREEDY loadbalancing)
- Old version works as expected:
com.azure:azure-messaging-eventhubs-checkpointstore-blob:1.12.2
Screenshots
Setup (please complete the following information):
- OS: Docker
eclipse-temurin:11
- Library/Libraries:
[INFO] +- com.azure:azure-messaging-eventhubs:jar:5.14.0:compile
[INFO] | +- (com.azure:azure-core:jar:1.33.0:compile - omitted for duplicate)
[INFO] | \- com.azure:azure-core-amqp:jar:2.7.2:compile
[INFO] | \- (com.azure:azure-core:jar:1.33.0:compile - omitted for duplicate)
[INFO] +- com.azure:azure-storage-blob:jar:12.20.0:compile
[INFO] | +- (com.azure:azure-core:jar:1.33.0:compile - omitted for duplicate)
[INFO] | +- com.azure:azure-core-http-netty:jar:1.12.6:compile
[INFO] | | \- (com.azure:azure-core:jar:1.33.0:compile - omitted for duplicate)
[INFO] | +- com.azure:azure-storage-common:jar:12.19.0:compile
[INFO] | | +- (com.azure:azure-core:jar:1.33.0:compile - omitted for duplicate)
[INFO] | | \- (com.azure:azure-core-http-netty:jar:1.12.6:compile - omitted for duplicate)
[INFO] | \- com.azure:azure-storage-internal-avro:jar:12.5.0:compile
[INFO] | +- (com.azure:azure-core:jar:1.33.0:compile - omitted for duplicate)
[INFO] | +- (com.azure:azure-core-http-netty:jar:1.12.6:compile - omitted for duplicate)
[INFO] | \- (com.azure:azure-storage-common:jar:12.19.0:compile - omitted for duplicate)
[INFO] +- com.azure:azure-messaging-eventhubs-checkpointstore-blob:jar:1.16.0:compile
[INFO] | +- (com.azure:azure-messaging-eventhubs:jar:5.14.0:compile - omitted for duplicate)
[INFO] | \- (com.azure:azure-storage-blob:jar:12.20.0:compile - omitted for duplicate)
[INFO] +- com.azure:azure-identity:jar:1.6.1:compile
[INFO] | +- (com.azure:azure-core:jar:1.33.0:compile - omitted for duplicate)
[INFO] | \- (com.azure:azure-core-http-netty:jar:1.12.6:compile - omitted for duplicate)
[INFO] \- com.azure:azure-core:jar:1.33.0:compile
- Java version: 11
- App Server/Environment: Dropwizard, k8s
- Frameworks:
Additional context Add any other context about the problem here.
Information Checklist
- Bug Description Added
- Repro Steps Added
- Setup information Added
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:14 (7 by maintainers)
Top Results From Across the Web
EventProcessorClient with azure-messaging-eventhubs ...
Greetings, I can concur with zeljko that the message still shows up. {"az.sdk.message":"Failed to claim ownership.","exception":"Status code 412 ...
Read more >Receive events using Event Processor Host - Azure Event Hubs
This article applies to the old version of Azure Event Hubs SDK. For current version of the SDK, see Balance partition load across...
Read more >Azure Event Hubs Event Processor client library for .NET ...
Reading and processing events across all partitions of an Event Hub at scale with resilience to transient failures and intermittent network issues.
Read more >azure-sdk-for-java eventhubs Partition has been lost
Yes, the EventProcessorClient in azure-messaging-eventhubs library will reconnect on such partitions. You don't need to change anything manually ...
Read more >Azure Event Hubs - Apache Camel
Send and receive events to/from Azure Event Hubs using AMQP protocol. ... Sets the CheckpointStore the EventProcessorClient will use for storing partition ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I can confirm, that the issue is fixed for us after switching to the latest provided versions.
We have reverted the incorrect changes as a quick fix. Please upgrade your eventhubs to the lastest version (current is
azure-messaging-eventhubs:5.15.0
andazure-messaging-eventhubs-checkpointstore-blob:1.16.1
) to solve this issue.