question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Producer compression_type ignored for api_version >= 0.11

See original GitHub issue

Is there a point in ignoring producer compression_type for api version >= 0.11?

I am trying to use a producer with gzip compression like so

producer = kafka.KafkaProducer(bootstrap_servers=settings.KAFKA_BOOSTRAP_SERVERS, compression_type='gzip')

future = producer.send(
    'test',
    value=json.dumps(data).encode('utf-8'),
    key=b'test'
)

future.get(timeout=60)

When the message is larger than 1MB i get kafka.errors.MessageSizeTooLargeError.

Digging deeper, kafka.producer.kafka.KafkaProducer has a method for estimating message size.

    def _max_usable_produce_magic(self):
        if self.config['api_version'] >= (0, 11):
            return 2
        elif self.config['api_version'] >= (0, 10):
            return 1
        else:
            return 0

    def _estimate_size_in_bytes(self, key, value, headers=[]):
        magic = self._max_usable_produce_magic()
        if magic == 2:
            return DefaultRecordBatchBuilder.estimate_size_in_bytes(
                key, value, headers)
        else:
            return LegacyRecordBatchBuilder.estimate_size_in_bytes(
                magic, self.config['compression_type'], key, value)

We have an api version 1.0.0, so compression_type is ignored when estimating message size.

I am using kafka-python==1.4.4.

Can you please explain a bit what is going on?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
tvoinarovskyicommented, Mar 3, 2019

Kafka-python in regards to message size estimation works only on non-compressed messages (basically that is always exact size of the message). Compressed message’s size is hard to predict and Java does a lot of strange things to allow more efficient batches (like trying to send a too big batch and reformat it and resend in case of failure). Therefore I left the logic that only operates on uncompressed batches. If you need to send 1 MB of data you need to set limits as if it’s not compressed. Sorry for the inconvienience, but doing compression estimation and resplitting batches on failure does not seem like an easy thing to do, at least I don’t see much benefit in it for now.

0reactions
alexeysofincommented, Mar 3, 2019

Thank you for clarifying!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kafka 3.3 Documentation
Topics in Kafka are always multi-producer and multi-subscriber: a topic can have zero, one, ... Such oversized messages must be ignored by consumer...
Read more >
Topic Configurations | Confluent Platform 5.4.4
compression.type : Specify the final compression type for a given topic. ... zstd, lz4, snappy, gzip, producer]; Server Default Property: compression.type ...
Read more >
Information for confluent-kafka-go developers
Package kafka provides high-level Apache Kafka producer and consumers using ... An application may choose to handle or ignore these events.
Read more >
Strimzi Documentation (0.11.0)
apiVersion : kafka.strimzi.io/v1alpha1 kind: Kafka metadata: name: my-cluster # . ... kubectl run kafka-producer -ti --image=strimzi/kafka:0.11.0-kafka-2.1.0 ...
Read more >
API Documentation — aiokafka 0.8.0 documentation
Default: aiokafka-producer-# (appended with a unique number per instance) ... api_version (str) – specify which kafka API version to use.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found