question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] io.confluent.connect.avro.AvroConverter does not work as a key/value converter in KafkaConnectors

See original GitHub issue

Describe the bug This has got to be either a known bug, or I’m doing something stupid. But I’m trying to use io.confluent.connect.avro.AvroConverter for key and value (de)serialization. I’ve tried to use it for a few different Kafka Connect connectors, but to simulate it without much complexity, I used the s3 connector, and I can get it to mess up every time.

I’ve tried downloading this and putting it in my plugins directory, but it still doesn’t seem to work: https://www.confluent.io/hub/confluentinc/kafka-connect-avro-converter

I started with a fairly straightforward configuration that worked. I copied the “kafka-connect-s3” directory from the confluent platform 5.5 (is there possibly a compatibility issue here??) directory. Also copied “kafka-connect-storage-common” (there’s a dependency there…)

Everything seems to generally work pretty well until I try to use the AvroConverter. Looking in “kafka-connect-storage-common” there this jar: kafka-connect-avro-converter-5.5.0.jar which should be all I need… But all I get is this:

    Tasks:
      Id:     0
      State:  FAILED
      Trace:  java.lang.NoClassDefFoundError: io/confluent/connect/avro/AvroConverterConfig
              at io.confluent.connect.avro.AvroConverter.configure(AvroConverter.java:64)
              at org.apache.kafka.connect.runtime.isolation.Plugins.newConverter(Plugins.java:266)
              at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:417)
              at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:873)
              at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:111)
              at org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:888)
              at org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:884)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)

In messing around with this configuration with other connectors, I’ve been abled to get: java.lang.NoClassDefFoundError for AbstractConfig as well sometimes… and then add those confluent common jars, then it goes back to not NoClassDefFoundError for AvroConverterConfig.

Something must be going on that I’m not seeing here.

Thanks!

To Reproduce I use the following KafkaConnector configuration:

apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaConnector
metadata:
  name: kafka-connector-s3-avro
  labels:
    strimzi.io/cluster: kafkaconnect-cluster
spec:
  class: io.confluent.connect.s3.S3SinkConnector
  tasksMax: 1
  config:
    format.class: io.confluent.connect.s3.format.json.JsonFormat
    s3.compression.type: gzip
    partitioner.class: io.confluent.connect.storage.partitioner.HourlyPartitioner
    topics: avrokafkamessagestopic
    s3.region: us-east-2
    s3.bucket.name: avrokafkamessages
    flush.size: 1
    storage.class: io.confluent.connect.s3.storage.S3Storage
    locale: en-US
    timezone: UTC
    schemas.enable: false
    key.converter: io.confluent.connect.avro.AvroConverter
    key.converter.schema.registry.url: http://schema-registry-release-cp-schema-registry:8081
    value.converter: io.confluent.connect.avro.AvroConverter
    value.converter.schema.registry.url: http://schema-registry-release-cp-schema-registry:8081

I know that what I’m doing isn’t supported out of the box, so I’ve followed the many tutorials on how to create your own kafkaconnect image here’s my docker file:

FROM strimzi/kafka-connect:0.11.4-kafka-2.1.0
USER root:root
COPY ./connect-plugins/ /opt/kafka/plugins/
USER 1001

I add the following directories to my ./connect-plugins directory: kafka-connect-s3 kafka-connect-storage-common

from confluent 5.5 platform: share/java

And my KafkaConnect configuration:

apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaConnect
metadata:
  name: kafkaconnect-cluster
  annotations:
#  # use-connector-resources configures this KafkaConnect
#  # to use KafkaConnector resources to avoid
#  # needing to call the Connect REST API directly
    strimzi.io/use-connector-resources: "true"
spec:
  version: 2.4.0
  replicas: 1
  bootstrapServers: kafka-cluster-kafka-external-bootstrap:9094
  image: ecr-repo/kafkaconnectors:tagname
  config:
    group.id: kafkaconnect-cluster
    offset.storage.topic: kafkaconnect-cluster-offsets
    offset.storage.replication.factor: 1
    config.storage.topic: kafkaconnect-cluster-configs
    config.storage.replication.factor: 1
    status.storage.topic: kafkaconnect-cluster-status
    status.storage.replication.factor: 1
  externalConfiguration:
    env:
      - name: AWS_ACCESS_KEY_ID
        valueFrom:
          secretKeyRef:
            name: aws-creds
            key: awsAccessKey  
      - name: AWS_SECRET_ACCESS_KEY
        valueFrom:
          secretKeyRef:
            name: aws-creds
            key: awsSecretAccessKey
  metrics:
    lowercaseOutputName: true
    lowercaseOutputLabelNames: true
    rules:
    - pattern : "kafka.connect<type=connect-worker-metrics>([^:]+):"
      name: "kafka_connect_connect_worker_metrics_$1"
    - pattern : "kafka.connect<type=connect-metrics, client-id=([^:]+)><>([^:]+)"
      name: "kafka_connect_connect_metrics_$1_$2"

Expected behavior Would like to be able to use the AvroConverter. I can’t be the only one, it must be something I’m doing wrong.

Environment (please complete the following information):

  • Strimzi version: 0.17.0
  • Installation method: Strimzi operator
  • Kubernetes cluster: Kubernetes 1.16
  • Infrastructure: Amazon EKS

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (4 by maintainers)

github_iconTop GitHub Comments

13reactions
cameronbraidcommented, Jun 3, 2020

@timkalanai thanks mate - thats greatly appreciated, with your help I was able to get it working - thanks again

here’s the Dockerfile i use

FROM confluentinc/cp-kafka-connect:5.5.0 as cp
FROM strimzi/kafka:0.18.0-kafka-2.5.0
USER root:root
COPY --from=cp /usr/share/java/kafka-connect-storage-common /opt/kafka/plugins/kafka-connect-storage-common
COPY --from=cp /usr/share/java/confluent-common /opt/kafka/plugins/confluent-common
COPY --from=cp /usr/share/java/kafka-connect-s3 /opt/kafka/plugins/kafka-connect-s3
COPY --from=cp /usr/share/java/kafka-connect-jdbc /opt/kafka/plugins/kafka-connect-jdbc
RUN cd /opt/kafka/plugins && for plugin in kafka-connect-s3 kafka-connect-jdbc; do cd $plugin; ln -s ../confluent-common; ln -s ../kafka-connect-storage-common; cd ..; done
4reactions
timkalanaicommented, May 18, 2020

I think I figured it out, and it’s weird…

I think my problem was with the way that kafka connect scans for “Connectors” versus “Converters”. There’s a lot of classloading magic in that Plugins file mentioned above. I’m probably not going to do an explanation justice because I don’t quite get it myself.

But as it scans through the folder structure (you can see it loading connectors and convertors in the logs as kafka connect starts up), it looks for Converters and Connectors in parallel. That being said, I think because they try to isolate connectors within each directory in the plugins folder, the dependencies for each have to be within each directory (which is what we thought).

Confluent has laid out their folder structure a little differently. There’s a commons folder with common libs (I needed common-config.jar and common-config.jar, but there are others in there too). There’s also a “kafka-storage-common” that has AvroConverter.

Confluent puts a symlink in the connector directories to make sure that they have access to the right converters. Think when connect is traversing each folder, it does a deep traversal, and a symlink looks like a directory.

I had a number of issues. But I finally solved it by keeping the symlink, and putting the confluent-common jars into the Kafka-storage-common folder. That way, when AvroConverter is detected in the kafka-storage-common folder, it loads in the same class loader as the common jars in the same folder.

And any connector I need “AvroConverter” for, I add a symlink to the “kafka-storage-common” directory.

I know it’s convoluted. Maybe I’m just too tired and not seeing straight, but everything seems to be working now. Hope this helps some person in the future.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kafka Connect Deep Dive – Converters and Serialization ...
Kafka Connect is part of Apache Kafka®, providing streaming ... Problem: Reading non-Avro data with AvroConverter; Problem: Reading a JSON ...
Read more >
org.apache.kafka.connect.errors.DataException - Stack Overflow
If you literally ran the Python sample code, then the key is not Avro, so a failure on the key.converter would be expected,...
Read more >
An Exercise with Kafka Connectors | by Oscar Oranagwa
But as you'll see from the problem parameters, it is a good ... for topic payment_token_updated to Avro: \n\tat io.confluent.connect.avro.
Read more >
Kafka Developer Manual - Connectors - Kinetica Docs
Kafka Connect allows you to configure the Kinetica Kafka Connector exactly the same for a simple Kafka stack as you would for an...
Read more >
kafka mongodb sink connector not generate key value of topic
Hi, my kafka topic has key and value schemas like this : key.avsc ... AvroConverter", "value.converter": "io.confluent.connect.avro.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found