to_confluent_avro low performance + warnings: Schema Registry client is already configured.
See original GitHub issueI’m consuming messages from kafka topic in confluent cloud using structured streaming and abris 3.1.0. While messages come fine the log is full of the following warnings:
WARN 2019-11-28 13:32:48,137 14137 za.co.absa.abris.avro.read.confluent.SchemaManager [Executor task launch worker for task 1]
Also the performance of this pipeline is quite low: ~10 records/s on a single worker of 4 cores with 14G RAM
Am I doing it the wrong way? Any way to avoid this warning and how does it affect the performance?
Here’s the code reading from in
topic, decoding/encoding, and writing to out
topic:
val schemaRegistryConfigIn = Map(
SchemaManager.PARAM_SCHEMA_REGISTRY_URL -> "...",
SchemaManager.PARAM_SCHEMA_REGISTRY_TOPIC -> "in",
SchemaManager.PARAM_VALUE_SCHEMA_NAMING_STRATEGY -> SchemaManager.SchemaStorageNamingStrategies.TOPIC_NAME,
SchemaManager.PARAM_VALUE_SCHEMA_ID -> "latest", // set to "latest" if you want the latest schema version to used
"basic.auth.credentials.source" -> "USER_INFO",
"schema.registry.basic.auth.user.info" -> "..."
)
val schemaRegistryConfigOut = schemaRegistryConfigIn + (SchemaManager.PARAM_SCHEMA_REGISTRY_TOPIC -> "out")
val inputDF = spark.readStream
.format("kafka")
.option("kafka.ssl.endpoint.identification.algorithm", "https")
.option("kafka.sasl.mechanism", "PLAIN")
.option("kafka.request.timeout.ms", "20000")
.option("kafka.bootstrap.servers", broker)
.option("kafka.retry.backoff.ms", "500")
.option(
"kafka.sasl.jaas.config",
"..."
)
.option("kafka.security.protocol", "SASL_SSL")
.option("subscribe", inputTopic)
.option("startingOffsets", startingOffsetsValue)
.load()
val outputDF = inputDF
.select(from_confluent_avro(col("value"), schemaRegistryConfigIn) as "parsed_message")
.select("parsed_message.*")
.select(
to_confluent_avro(struct("firstname", "lastname", "country"), schemaRegistryConfigOut) as "value"
)
val query = outputDF.writeStream
.format("kafka")
.option("checkpointLocation", pathCheckpoint)
.option("kafka.ssl.endpoint.identification.algorithm", "https")
.option("kafka.sasl.mechanism", "PLAIN")
.option("kafka.request.timeout.ms", "20000")
.option("kafka.bootstrap.servers", broker)
.option("kafka.retry.backoff.ms", "500")
.option(
"kafka.sasl.jaas.config",
"..."
)
.option("kafka.security.protocol", "SASL_SSL")
.option("topic", outputTopic)
.start()
Issue Analytics
- State:
- Created 4 years ago
- Comments:10
Top Results From Across the Web
Schema Registry Configuration Options
Low : These parameters have a less general or less significant impact on performance. These parameters are defined in the Schema Registry configuration...
Read more >Can't build schema registry · Issue #183 - GitHub
I am trying to build schema registry. It says build rest-utils and common first (I did that) and now when I am building...
Read more >abris - Scaladex
Seamlessly integrate with Confluent platform, including Schema Registry with all available naming strategies and schema evolution.
Read more >Chapter 9. Managing schemas with Service Registry
Service Registry provides full schema registry support for Avro schemas, which are used by client applications through Kafka client serializer/deserializer ( ...
Read more >Getting started with Schema Registry - AWS Glue
In the navigation pane, under Data catalog, choose Schema registries. Choose Add registry. Enter a Registry name for the registry, consisting of letters, ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello @agolovenko, thanks for the PR.
We investigated the issue further and the core of the problem seems to be in
CatalystDataToAvro
. To answer your question: yes the warning shouldn’t be created if everything works as it should. I’m already working on a fix to both of these issues. It should be finished tomorrow.Hi @cerveada ! Just checked the fix and it works great. The performance is back to the exected level now. Thanks a lot!