question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

to_confluent_avro low performance + warnings: Schema Registry client is already configured.

See original GitHub issue

I’m consuming messages from kafka topic in confluent cloud using structured streaming and abris 3.1.0. While messages come fine the log is full of the following warnings:

WARN 2019-11-28 13:32:48,137 14137 za.co.absa.abris.avro.read.confluent.SchemaManager [Executor task launch worker for task 1]

Also the performance of this pipeline is quite low: ~10 records/s on a single worker of 4 cores with 14G RAM

Am I doing it the wrong way? Any way to avoid this warning and how does it affect the performance?

Here’s the code reading from in topic, decoding/encoding, and writing to out topic:

  val schemaRegistryConfigIn = Map(
    SchemaManager.PARAM_SCHEMA_REGISTRY_URL          -> "...",
    SchemaManager.PARAM_SCHEMA_REGISTRY_TOPIC        -> "in",
    SchemaManager.PARAM_VALUE_SCHEMA_NAMING_STRATEGY -> SchemaManager.SchemaStorageNamingStrategies.TOPIC_NAME,
    SchemaManager.PARAM_VALUE_SCHEMA_ID              -> "latest", // set to "latest" if you want the latest schema version to used
    "basic.auth.credentials.source"                  -> "USER_INFO",
    "schema.registry.basic.auth.user.info"           -> "..."
  )
  val schemaRegistryConfigOut = schemaRegistryConfigIn + (SchemaManager.PARAM_SCHEMA_REGISTRY_TOPIC -> "out")

  val inputDF = spark.readStream
    .format("kafka")
    .option("kafka.ssl.endpoint.identification.algorithm", "https")
    .option("kafka.sasl.mechanism", "PLAIN")
    .option("kafka.request.timeout.ms", "20000")
    .option("kafka.bootstrap.servers", broker)
    .option("kafka.retry.backoff.ms", "500")
    .option(
      "kafka.sasl.jaas.config",
      "..."
    )
    .option("kafka.security.protocol", "SASL_SSL")
    .option("subscribe", inputTopic)
    .option("startingOffsets", startingOffsetsValue)
    .load()

  val outputDF = inputDF
    .select(from_confluent_avro(col("value"), schemaRegistryConfigIn) as "parsed_message")
    .select("parsed_message.*")
    .select(
      to_confluent_avro(struct("firstname", "lastname", "country"), schemaRegistryConfigOut) as "value"
    )

  val query = outputDF.writeStream
    .format("kafka")
    .option("checkpointLocation", pathCheckpoint)
    .option("kafka.ssl.endpoint.identification.algorithm", "https")
    .option("kafka.sasl.mechanism", "PLAIN")
    .option("kafka.request.timeout.ms", "20000")
    .option("kafka.bootstrap.servers", broker)
    .option("kafka.retry.backoff.ms", "500")
    .option(
      "kafka.sasl.jaas.config",
      "..."
    )
    .option("kafka.security.protocol", "SASL_SSL")
    .option("topic", outputTopic)
    .start()

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10

github_iconTop GitHub Comments

2reactions
cerveadacommented, Dec 2, 2019

Hello @agolovenko, thanks for the PR.

We investigated the issue further and the core of the problem seems to be in CatalystDataToAvro. To answer your question: yes the warning shouldn’t be created if everything works as it should. I’m already working on a fix to both of these issues. It should be finished tomorrow.

0reactions
agolovenkocommented, Dec 4, 2019

Hi @cerveada ! Just checked the fix and it works great. The performance is back to the exected level now. Thanks a lot!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Schema Registry Configuration Options
Low : These parameters have a less general or less significant impact on performance. These parameters are defined in the Schema Registry configuration...
Read more >
Can't build schema registry · Issue #183 - GitHub
I am trying to build schema registry. It says build rest-utils and common first (I did that) and now when I am building...
Read more >
abris - Scaladex
Seamlessly integrate with Confluent platform, including Schema Registry with all available naming strategies and schema evolution.
Read more >
Chapter 9. Managing schemas with Service Registry
Service Registry provides full schema registry support for Avro schemas, which are used by client applications through Kafka client serializer/deserializer ( ...
Read more >
Getting started with Schema Registry - AWS Glue
In the navigation pane, under Data catalog, choose Schema registries. Choose Add registry. Enter a Registry name for the registry, consisting of letters, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found