question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

consumer.receive will only process events at interval of ~2 seconds. How to increase rate?

See original GitHub issue
  • Package Name:
  • azure-eventhub
  • Package Version:
  • "^5.10.0"
  • Operating System:
  • Windows 11
  • Python Version:
  • Python 3.10.6

Describe the bug

I am trying to diagnose a latency issue when subscribing to EvenHub using python sdk. It is only processing received events at a rate interval of ~2 seconds (which is very suspicious it is some default configuration value)

I am using the consumer.receive method and seem to have follow the samples very closely.

I have create two similar testing utilities and it seems for the same EvenHub the dotnet sdk works as expected, sends and receives hundreds of events in a few seconds. The python can publish /send large amounts of events on 1 or 2 seconds but when receiving it takes 1 or 2 seconds PER EVENT which makes it unusable for practical use. At this rate it would take so long to process all the events, they would expire before being processed.

I created a video to make it more clear

Video

Given the dotnet works and python doesnt for the same EventHub and Consumer Group, I see two results:

  1. I have something configured incorrectly or bug in test causing behavior
  2. There is a bug in the library

Hopefully someone can help identify option 1 since that is much easier for fix.

To Reproduce Steps to reproduce the behavior:

  1. Attempt to receive events using the python azure-eventhub library using the synchronous receive pattern.

Expected behavior

Expected to receive ALL events available to the consumer group in very short amount of time (hundreds of events should be able to be processed within few seconds)

Screenshots If applicable, add screenshots to help explain your problem.

See the video

Additional context

N/A

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
mattmazzolacommented, Oct 19, 2022

Hi, I wanted to provide some new information from our usage and also ask if there is any status update from the team maintaining this library.

Background:

1. sync consumer

Originally we were using this style of consumption:

with self._consumer:
  self._consumer.receive(
    on_event=process_deserialized_event,
    max_wait_time=None,
    starting_position="@latest"
  )

This has the ~2 second delay issue because it taking significant time to rotate through each of the 32 partitions and receive events from the 1 partition that has them.

2. Async consumer

We changed to use the async version of library and this seemed to resolve issues

async with self._consumer:
  await self._consumer.receive(
    on_event=process_raw_event,
    max_wait_time=None,
    starting_position="@latest"
  )

Producer Issues

1. Basic sync Producer

We had started using this basic pattern for publishing events

self._producer = EventHubProducerClient.from_connection_string(publish_subscribe_connection_string)
...
self._producer.send_batch(event_batch)

We noticed a similar issue where the producer would not publish events immediately. It would also take around ~2 sec between publishing. (We suspect a similar issue with partitions but are not sure)

2. Buffered mode, very small max_wait_time

We changed the producer to user buffered mode with a very small max_wait_time so seemingly force the producer to publish events at a very fast rate.

self.producer = EventHubProducerClient.from_connection_string(
  publish_subscribe_connection_string,
  buffered_mode=True,
  max_wait_time=0.001,
  on_success=(lambda _, __: None),
  on_error=(lambda _, __, e: _LOGGER.error("Error in producer: " + str(e))),
)

This unblocked us but as we continued to add more features and other computations we saw significant slow downs of the publishing of events. We tried moving the consumer and producer to different threads, but given python only allows a single thread execute at once this didn’t help. In summary, the CPU execution time used to consume would delay or block the execution of other computation to produce events and then switch to publish the events. This resulted in the observed slow performance.

3. Separate Processes

We ended up having to move the EH consumer and produces into their own process that only transfer events two and from queues to share the events between processes. This allows the consumer and producer to run at different rates

        self.__outgoing_event_queue: Queue[EventData] = Queue()
        self.__incoming_event_queue: Queue[EventData] = Queue()

        self.consumer_process = MessageConsumer(
            queue=self.__incoming_event_queue,
            publish_subscribe_connection_string=publish_subscribe_connection_string,
            consumer_group=consumer_group,
            event_hub_name=event_hub_name,
        )
        self.consumer_process.start()

        self.producer_process = MessageProducer(
            queue=self.__outgoing_event_queue,
            publish_subscribe_connection_string=publish_subscribe_connection_string,
            event_hub_name=event_hub_name,
        )
        self.producer_process.start()

Feedback

I think the amount of work we had to do to troubleshoot this library and get it to work correctly for our usage is too high. I think most people trying to work with the library might give up or use an alternative communication stack if they didn’t have other reasons to use EventHub.

I wanted to know if the team maintaining this library can provide any status update on investigation into this issue. Is there a solution coming? If so, what is the ETA? If there is not a solution, (since it is a known expected limitation with working with python and EventHubs with high amount of partitions), can you also confirm this?

Suggestion for Documentation

I think there could be updates to the documentation to warn users about these potential problems and inform them about the workarounds we have done in the examples above so they don’t have to discover and solve the same issues we did. This would mean they setup and use the library correctly the first time instead of having the bad experience we did.

Also, there may be people who are using the library who have an existing performance issue due to the conditions similar to ours but yet are not aware and the update to documentation could at least alert them to the known problems.

Mention: @tupini07 so he also gets updates.

Interested to hear your thoughts.

1reaction
mattmazzolacommented, Sep 1, 2022

In your .NET sample, what is the starting position that you are passing in?

The dotnet testing tool uses a slightly different testing pattern than the python since it has more advanced features with process and cancelation tokens.

The dotnet sequence is this:

  1. Establish consumer connection and start processing
  2. Establish producer connection and send events

This means the consumer can use default/latest and it would still receive the events.

The dotnet tool is actually available to see here: https://github.com/mattmazzola/dotnet-servicebus-cli/blob/6ed2a8e03155616fe14b3a200b23f9c6498966fb/Commands/LatencyTest.cs#L78-L82 (It’s called servicebus-cli because that’s how it started, but it works with EventHub

The python testing tool sequence is reversed since the subscribe methods block which means we can’t start processing and then also execute other code without much more advanced threads or something.

  1. Establish producer connection and send events
  2. Establish consumer with start position set to 10 seconds ago (to allow getting all the events that were just produced)

In you .NET sample, were you receiving one event at a time or receiving a batch of max size 10?

The 10 parameter in the video was for limit size of sent batches not for limiting receiving. For receive it’s one event at a time using the EventProcessorClient pattern.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Chapter 4. Kafka Consumers: Reading Data from Kafka
The parameter we pass, poll() , is a timeout interval and controls how long poll() will block if data is not available in...
Read more >
Kafka Consumer polling interval
1 Answer 1 · So we can just call poll() in a loop without delays to essentially receive messages as soon as they...
Read more >
Consuming Tasks — huey 2.4.4 documentation - Read the Docs
Default is 0.1 seconds. For example, when the consumer starts up it will begin polling every 0.1 seconds. If no tasks are found...
Read more >
Working with Amazon SQS messages
... requires 10 seconds to process a message but you set the visibility timeout to only 2 seconds, a duplicate message is received...
Read more >
Kafka Consumer Important Settings: Poll & Internal ...
Kafka consumers poll the Kafka broker to receive batches of data. ... heartbeat.interval.ms (default is 3 seconds) The expected time between heartbeats to ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found