seek_to_beginning is difficult to use when using topic subscription
See original GitHub issueI have some issues using KafkaConsumer.seek_to_beginning
. The only way I have
gotten it to work thus far is to call consumer.topics()
before calling
seek_to_beginning
, however I do not understand why and only found it out by
trial and error.
I suggest too either update the documentation to reflect the behaviour or to
change the implementation of seek_to_beginning
to be more intuitive. I’m more
than happy to help with the documentation however that might be of limited use
since I don’t understand the current behaviour.
from kafka import KafkaConsumer
consumer = KafkaConsumer("some.topic",
bootstrap_servers=["kafka.example.com"],
group_id='some-group-id'))
# consumer.topics()
consumer.seek_to_beginning()
for message in consumer:
print(message)
Without the consumer.topics()
call I get the following exception:
$ python3 test.py
Traceback (most recent call last):
File "test.py", line 9, in <module>
consumer.seek_to_beginning()
File "./venv/lib/python3.5/site-packages/kafka/consumer/group.py", line 581, in seek_to_beginning
assert partitions, 'No partitions are currently assigned'
AssertionError: No partitions are currently assigned
Im using kafka-python 1.0.2
Issue Analytics
- State:
- Created 8 years ago
- Comments:17 (8 by maintainers)
Top Results From Across the Web
seekToBeginning doesn't work without auto.offset.reset ... - Re
I can get around this by making the user provide a 0-arg function to return a fully configured + subscribed Kafka consumer, so...
Read more >Re: seekToBeginning doesn't work without auto.offset.reset
Cody, Use ConsumerRebalanceListener to achieve that, ConsumerRebalanceListener listener = new ConsumerRebalanceListener() { @Override public ...
Read more >Why don't Kafka's seekToBeginning and seekToEnd work with ...
I've seen a similar topic but the problem dealt with the subscribe() , not with the assign() method. The proposed solution was to...
Read more >KafkaConsumer (clients 2.1.1-cp6 API)
A client that consumes records from a Kafka cluster. This client transparently handles the failure of Kafka brokers, and transparently adapts as topic...
Read more >Chapter 4. Kafka Consumers: Reading Data from Kafka
Let's take topic T1 with four partitions. Now suppose we created a new consumer, C1, which is the only consumer in group G1,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Agree, it is a bit strange. The API is modeled after the official java client, which has the same issue. I am going to wait to see how they handle it before implementing any API changes. In the meantime, you can try a few different approaches:
(1) set
auto_offset_reset='earliest'
in your KafkaConsumer configuration. This will cause the consumer to fetch from the beginning of the topic/partition if the consumer group does not have a committed offset. So this works for the first run on a consumer group, but subsequent runs will resume at whatever offset the group last committed.(2) in addition to (1) , also set
group_id=None
. This is roughly similar to the console-consumer --from-beginning. It will not commit offsets, but it will also not do group coordination, which means you wont be able to run several consumers together and have the partitions automatically divided up and allocated between them.(3) manually assign partitions via
consumer.assign()
instead of subscribing to topics viaconsumer.subscribe()
. If you do this,seek_to_beginning()
should work as expected.There are a few other approaches, but these are the 3 I generally recommend at this point.
Apparently, the only thing to do is to call
consumer._client.poll()
before callingconsumer.seek_to_beginning()
. This will eventually send the metadata request, and dispatch the partitions.That function will call
client._maybe_refresh_metadata()
,client._poll()
andclient._fire_pending_completed_requests()
. There is no other function calling this sequence.Unfortunately, the only function calling
consumer._client.poll()
areconsumer.__next__
andconsumer.poll
(which is not returning the response, so impossible to check if it has arrived), so there’s no cleaner way to do this currently.