Producer compression_type ignored for api_version >= 0.11
See original GitHub issueIs there a point in ignoring producer compression_type for api version >= 0.11?
I am trying to use a producer with gzip compression like so
producer = kafka.KafkaProducer(bootstrap_servers=settings.KAFKA_BOOSTRAP_SERVERS, compression_type='gzip')
future = producer.send(
'test',
value=json.dumps(data).encode('utf-8'),
key=b'test'
)
future.get(timeout=60)
When the message is larger than 1MB i get kafka.errors.MessageSizeTooLargeError.
Digging deeper, kafka.producer.kafka.KafkaProducer has a method for estimating message size.
def _max_usable_produce_magic(self):
if self.config['api_version'] >= (0, 11):
return 2
elif self.config['api_version'] >= (0, 10):
return 1
else:
return 0
def _estimate_size_in_bytes(self, key, value, headers=[]):
magic = self._max_usable_produce_magic()
if magic == 2:
return DefaultRecordBatchBuilder.estimate_size_in_bytes(
key, value, headers)
else:
return LegacyRecordBatchBuilder.estimate_size_in_bytes(
magic, self.config['compression_type'], key, value)
We have an api version 1.0.0, so compression_type is ignored when estimating message size.
I am using kafka-python==1.4.4.
Can you please explain a bit what is going on?
Issue Analytics
- State:
- Created 5 years ago
- Comments:5
Top Results From Across the Web
Kafka 3.3 Documentation
Topics in Kafka are always multi-producer and multi-subscriber: a topic can have zero, one, ... Such oversized messages must be ignored by consumer...
Read more >Topic Configurations | Confluent Platform 5.4.4
compression.type : Specify the final compression type for a given topic. ... zstd, lz4, snappy, gzip, producer]; Server Default Property: compression.type ...
Read more >Information for confluent-kafka-go developers
Package kafka provides high-level Apache Kafka producer and consumers using ... An application may choose to handle or ignore these events.
Read more >Strimzi Documentation (0.11.0)
apiVersion : kafka.strimzi.io/v1alpha1 kind: Kafka metadata: name: my-cluster # . ... kubectl run kafka-producer -ti --image=strimzi/kafka:0.11.0-kafka-2.1.0 ...
Read more >API Documentation — aiokafka 0.8.0 documentation
Default: aiokafka-producer-# (appended with a unique number per instance) ... api_version (str) – specify which kafka API version to use.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Kafka-python in regards to message size estimation works only on non-compressed messages (basically that is always exact size of the message). Compressed message’s size is hard to predict and Java does a lot of strange things to allow more efficient batches (like trying to send a too big batch and reformat it and resend in case of failure). Therefore I left the logic that only operates on uncompressed batches. If you need to send 1 MB of data you need to set limits as if it’s not compressed. Sorry for the inconvienience, but doing compression estimation and resplitting batches on failure does not seem like an easy thing to do, at least I don’t see much benefit in it for now.
Thank you for clarifying!