consumer.receive will only process events at interval of ~2 seconds. How to increase rate?
See original GitHub issue- Package Name:
azure-eventhub
- Package Version:
"^5.10.0"
- Operating System:
Windows 11
- Python Version:
Python 3.10.6
Describe the bug
I am trying to diagnose a latency issue when subscribing to EvenHub using python sdk. It is only processing received events at a rate interval of ~2 seconds (which is very suspicious it is some default configuration value)
I am using the consumer.receive
method and seem to have follow the samples very closely.
I have create two similar testing utilities and it seems for the same EvenHub the dotnet sdk works as expected, sends and receives hundreds of events in a few seconds. The python can publish /send large amounts of events on 1 or 2 seconds but when receiving it takes 1 or 2 seconds PER EVENT which makes it unusable for practical use. At this rate it would take so long to process all the events, they would expire before being processed.
I created a video to make it more clear
Video
Given the dotnet works and python doesnt for the same EventHub and Consumer Group, I see two results:
- I have something configured incorrectly or bug in test causing behavior
- There is a bug in the library
Hopefully someone can help identify option 1 since that is much easier for fix.
To Reproduce Steps to reproduce the behavior:
- Attempt to receive events using the python
azure-eventhub
library using the synchronous receive pattern.
Expected behavior
Expected to receive ALL events available to the consumer group in very short amount of time (hundreds of events should be able to be processed within few seconds)
Screenshots If applicable, add screenshots to help explain your problem.
See the video
Additional context
N/A
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:9 (4 by maintainers)
Top GitHub Comments
Hi, I wanted to provide some new information from our usage and also ask if there is any status update from the team maintaining this library.
Background:
1. sync consumer
Originally we were using this style of consumption:
This has the ~2 second delay issue because it taking significant time to rotate through each of the 32 partitions and receive events from the 1 partition that has them.
2. Async consumer
We changed to use the async version of library and this seemed to resolve issues
Producer Issues
1. Basic sync Producer
We had started using this basic pattern for publishing events
We noticed a similar issue where the producer would not publish events immediately. It would also take around ~2 sec between publishing. (We suspect a similar issue with partitions but are not sure)
2. Buffered mode, very small max_wait_time
We changed the producer to user buffered mode with a very small
max_wait_time
so seemingly force the producer to publish events at a very fast rate.This unblocked us but as we continued to add more features and other computations we saw significant slow downs of the publishing of events. We tried moving the consumer and producer to different threads, but given python only allows a single thread execute at once this didn’t help. In summary, the CPU execution time used to consume would delay or block the execution of other computation to produce events and then switch to publish the events. This resulted in the observed slow performance.
3. Separate Processes
We ended up having to move the EH consumer and produces into their own process that only transfer events two and from queues to share the events between processes. This allows the consumer and producer to run at different rates
Feedback
I think the amount of work we had to do to troubleshoot this library and get it to work correctly for our usage is too high. I think most people trying to work with the library might give up or use an alternative communication stack if they didn’t have other reasons to use EventHub.
I wanted to know if the team maintaining this library can provide any status update on investigation into this issue. Is there a solution coming? If so, what is the ETA? If there is not a solution, (since it is a known expected limitation with working with python and EventHubs with high amount of partitions), can you also confirm this?
Suggestion for Documentation
I think there could be updates to the documentation to warn users about these potential problems and inform them about the workarounds we have done in the examples above so they don’t have to discover and solve the same issues we did. This would mean they setup and use the library correctly the first time instead of having the bad experience we did.
Also, there may be people who are using the library who have an existing performance issue due to the conditions similar to ours but yet are not aware and the update to documentation could at least alert them to the known problems.
Mention: @tupini07 so he also gets updates.
Interested to hear your thoughts.
The dotnet testing tool uses a slightly different testing pattern than the python since it has more advanced features with process and cancelation tokens.
The dotnet sequence is this:
This means the consumer can use default/latest and it would still receive the events.
The dotnet tool is actually available to see here: https://github.com/mattmazzola/dotnet-servicebus-cli/blob/6ed2a8e03155616fe14b3a200b23f9c6498966fb/Commands/LatencyTest.cs#L78-L82 (It’s called
servicebus-cli
because that’s how it started, but it works with EventHubThe python testing tool sequence is reversed since the subscribe methods block which means we can’t start processing and then also execute other code without much more advanced threads or something.
The 10 parameter in the video was for limit size of sent batches not for limiting receiving. For receive it’s one event at a time using the
EventProcessorClient
pattern.