question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add context manager (with statement) for more Pythonic use of Consumers (and Producers)

See original GitHub issue

This is a feature request. I’m interested in trying to write it myself, but I want feedback on the idea before making a large pull request. Let me know if there’s a better location to submit this than here.


I was watching this Raymond Hettinger talk, where starting here he demonstrates best practice for wrapping a Java interface for use in Python, and I instantly thought of kafka-python.

Currently code using a KafkaConsumer looks like this (copied from example.py):

...
consumer = KafkaConsumer(bootstrap_servers='localhost:9092',
                         auto_offset_reset='earliest',
                         consumer_timeout_ms=1000)
consumer.subscribe(['my-topic'])
while not self.stop_event.is_set():
   for message in consumer:
        print(message)
        if self.stop_event.is_set():
            break
consumer.close()
...

Note the need for a manual consumer.close(), and consumer.subscribe(), as well as the checking of self.stop_event.is_set() in two places.

Mirroring the linked talk, a more pythonic interface would, do the close() for us, and include the subscribe() in the initial call (though still allowing subscribe calls later to change the subscription. It would look like this:

with KafkaConsumer(boostrap_servers = 'localhost:9002, auto_offset_reset='earliest'
                , subscribe = ['my-topic'], stop_if = self.stop_event.is_set) as consumer:
    for message in consumer:
        print(message)

The with statement here, with tweaks to the iterator, could do a bunch of things for you automatically:

  1. Open the consumer and optionally create an initial subscription
  2. To avoid breaking existing implementations, an alternate constructor for the for loop iterator allows us to pass a bool variable, lambda or function that when true will raise StopIteration instead of returning another message. Called either because of a parameter in the init, or by using a line like for message in consumer.stop_if(self.stop_event.is_set()): this eliminates a extra loop and an if statement by keeping all loop control logic into one place. The condition could even be later removed by consumer.stop_if = None resulting in the old (current) behavior. itertools.takewhile(predicate, iterable) would probably be a good place to start when building this.
  3. Calls close(), any other teardown logic, and optionally commits reads without needing any additional code.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:4
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jeffwidmancommented, Nov 14, 2017

See also #1101

0reactions
TheAtomicOptioncommented, Nov 17, 2017

Here’s the wrapper I wrote that does work but forces a consumer_timeout_ms value to do it.

Edit: this wrapper actually works now. I’d just needed to reset the generator and next consumer timeout. every loop exception. Still uses hidden timeout though.

class PythonicKafkaConsumer(KafkaConsumer):
    def __init__(self, *args, **kwargs):
        pkc_configs = {
            'stop_if': None,
            'topics': None,
            'pattern': None,
            }
        for key, value in pkc_configs.items():
            if key in kwargs.keys():
                pkc_configs[key] = kwargs.pop(key)

        if 'consumer_timeout_ms' not in kwargs.keys():
            kwargs['consumer_timeout_ms'] = 1000

        super(PythonicKafkaConsumer, self).__init__(*args, **kwargs)

        if pkc_configs['topics'] is not None:
            self.subscribe(topics=pkc_configs['topics'])
        if pkc_configs['pattern'] is not None:
            self.subscribe(pattern=pkc_configs['pattern'])
        self.stop_if = pkc_configs['stop_if']

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.close()
    
    def __next__(self):
        if not self._iterator:
            self._iterator = self._message_generator()
        self._set_consumer_timeout()

        if not hasattr(self, 'stop_if'):  # old behavior when no event
            try:
                return next(self._iterator)
            except StopIteration:
                self._iterator = None
                raise StopIteration

        while not self.stop_if.is_set():  # if stop_if event exists, loop until event is set.
            try:
                return next(self._iterator)
            except StopIteration:
                self._iterator = self._message_generator()
                self._set_consumer_timeout()
                continue
        self._iterator=None
        raise StopIteration
Read more comments on GitHub >

github_iconTop Results From Across the Web

Context Managers and Python's with Statement
In this step-by-step tutorial, you'll learn what the Python with statement is and how to use it with existing context managers.
Read more >
Using locks in the with statement (Context Manager) - 2020
Python Multithreading Tutorial: Using locks in the with statement ... Locks implement the context manager API and are compatible with the with statement....
Read more >
Does using a context manager in a generator may lead to ...
To make it occur at a guaranteed point in time, you can use contextlib.closing to get guaranteed closing of the generator itself:
Read more >
contextlib — Utilities for with-statement contexts ... - Python Docs
This function is a decorator that can be used to define a factory function for with statement context managers, without needing to create...
Read more >
Trio's core functionality — Trio 0.21.0+dev documentation
See Cancellation and timeouts below for more details. ... Use as a context manager to create a cancel scope with the given absolute...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found