Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Producer compression_type ignored for api_version >= 0.11

See original GitHub issue

Is there a point in ignoring producer compression_type for api version >= 0.11?

I am trying to use a producer with gzip compression like so

producer = kafka.KafkaProducer(bootstrap_servers=settings.KAFKA_BOOSTRAP_SERVERS, compression_type='gzip')

future = producer.send(
    'test',
    value=json.dumps(data).encode('utf-8'),
    key=b'test'
)

future.get(timeout=60)

When the message is larger than 1MB i get kafka.errors.MessageSizeTooLargeError.

Digging deeper, kafka.producer.kafka.KafkaProducer has a method for estimating message size.

    def _max_usable_produce_magic(self):
        if self.config['api_version'] >= (0, 11):
            return 2
        elif self.config['api_version'] >= (0, 10):
            return 1
        else:
            return 0

    def _estimate_size_in_bytes(self, key, value, headers=[]):
        magic = self._max_usable_produce_magic()
        if magic == 2:
            return DefaultRecordBatchBuilder.estimate_size_in_bytes(
                key, value, headers)
        else:
            return LegacyRecordBatchBuilder.estimate_size_in_bytes(
                magic, self.config['compression_type'], key, value)

We have an api version 1.0.0, so compression_type is ignored when estimating message size.

I am using kafka-python==1.4.4.

Can you please explain a bit what is going on?

Issue Analytics

State:
Created 5 years ago
Comments:5

Top GitHub Comments

1reaction

tvoinarovskyicommented, Mar 3, 2019

Kafka-python in regards to message size estimation works only on non-compressed messages (basically that is always exact size of the message). Compressed message’s size is hard to predict and Java does a lot of strange things to allow more efficient batches (like trying to send a too big batch and reformat it and resend in case of failure). Therefore I left the logic that only operates on uncompressed batches. If you need to send 1 MB of data you need to set limits as if it’s not compressed. Sorry for the inconvienience, but doing compression estimation and resplitting batches on failure does not seem like an easy thing to do, at least I don’t see much benefit in it for now.

0reactions

alexeysofincommented, Mar 3, 2019

Thank you for clarifying!

Top Results From Across the Web

Kafka 3.3 Documentation

Topics in Kafka are always multi-producer and multi-subscriber: a topic can have zero, one, ... Such oversized messages must be ignored by consumer...

Topic Configurations | Confluent Platform 5.4.4

compression.type : Specify the final compression type for a given topic. ... zstd, lz4, snappy, gzip, producer]; Server Default Property: compression.type ...

Information for confluent-kafka-go developers

Package kafka provides high-level Apache Kafka producer and consumers using ... An application may choose to handle or ignore these events.

Strimzi Documentation (0.11.0)

apiVersion : kafka.strimzi.io/v1alpha1 kind: Kafka metadata: name: my-cluster # . ... kubectl run kafka-producer -ti --image=strimzi/kafka:0.11.0-kafka-2.1.0 ...

API Documentation — aiokafka 0.8.0 documentation

Default: aiokafka-producer-# (appended with a unique number per instance) ... api_version (str) – specify which kafka API version to use.