Jobs are failing during the schema registration while writing to kafka from a batch dataframe
See original GitHub issueHi,
We have built an ETL tool(https://github.com/homeaway/datapull) for moving data across many data platforms and we are trying to include Kafka as part of our platform.
we are trying to use this library for writing data to Kafka from a batch dataframe and couldn’t get it to work because of the following error.
Exception in thread "main" java.lang.NoSuchFieldError: FACTORY
at org.apache.avro.Schemas.toString(Schemas.java:36)
at org.apache.avro.Schemas.toString(Schemas.java:30)
at io.confluent.kafka.schemaregistry.avro.AvroSchema.canonicalString(AvroSchema.java:140)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:206)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:268)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:244)
at io.confluent.kafka.schemaregistry.client.SchemaRegistryClient.register(SchemaRegistryClient.java:42)
at za.co.absa.abris.avro.read.confluent.SchemaManager.register(SchemaManager.scala:77)
at za.co.absa.abris.avro.read.confluent.SchemaManager.$anonfun$getIfExistsOrElseRegisterSchema$1(SchemaManager.scala:124)
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
at scala.Option.getOrElse(Option.scala:189)
at za.co.absa.abris.avro.read.confluent.SchemaManager.getIfExistsOrElseRegisterSchema(SchemaManager.scala:124)
at za.co.absa.abris.config.ToSchemaRegisteringConfigFragment.usingSchemaRegistry(Config.scala:135)
at za.co.absa.abris.config.ToSchemaRegisteringConfigFragment.usingSchemaRegistry(Config.scala:131)
at org.example.App$.main(App.scala:37)
at org.example.App.main(App.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)```
This is a bare minimum project which we have written for testing this: https://github.com/markovarghese/simplespark before we implement this as part of our tool.
Any immediate help would be greatly appreciated as we are trying to onboard a few users who are on-hold for this functionality. and please feel free to contact us for any questions.
Thanks for looking into this.
Regards,
Srini
Issue Analytics
- State:
- Created 3 years ago
- Comments:12
Top Results From Across the Web
Integrating Spark Structured Streaming with the Confluent ...
Show activity on this post. I'm using a Kafka Source in Spark Structured Streaming to receive Confluent encoded Avro records. I intend to...
Read more >Structured Streaming Programming Guide - Apache Spark
Streaming DataFrames can be created through the DataStreamReader interface (Scala/Java/Python docs) returned by SparkSession.readStream() . In R, with the read.
Read more >Schema Registry Overview - Confluent Documentation
In Kafka primary election, the Schema ID is always based off the last ID that was written to Kafka store. During a primary...
Read more >Running Streaming Jobs Once a Day For 10x Cost Savings
The ETL jobs may (in practice, often will) fail. If your job fails, then you need to ensure that the output of your...
Read more >Delta Table Data Types
Delta Table Data TypesBuild & deploy a simple pipeline to listen to Kafka and push them to delta tables. Let's start creating a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@markovarghese version 4.0.1 that should fix this issue was released.
Would you mind testing if it works as expected?
Thanks for help. I will close this issue, feel free to open a new one if needed.