question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Similar to 'from_json(Column)' SparkSQL UDF, implement 'from_confluent_avro(Column)'

See original GitHub issue

From the earlier issues, I saw that both keys and values should be able to be deserialized as Avro (I think this was the initial design, or only the values are extracted? Didn’t dig too deep there).

Then there was discussion about keys are not Avro, and the values are Avro, which lead to funky classes like ConfluentKafkaAvroWriterWithPlainKey

As per discussion in #6 , I mentioned, what if I have a “plain value” and an avro key? What if my keys or values are some other datatype, not just strings and the other field is Avro?

Rather than create one-off methods for each combination of supported Avro data-types, I feel like a better way to implement encoders/decoders would be to create individual UDF-type functions such as the existing from_json(Column, Map[String, String]) to to_json(Column, Map[String, String]) functions in Spark, where for Avro support, those option maps would include at least the schema registry url.

From a usability perspective, I would expect this to work if I was doing “word count” on a topic with an Avro key.

df.select(from_confluent_avro(col("key")), col("value").cast("int"))

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
OneCricketeercommented, Sep 16, 2019

Happy to see #48 😁

1reaction
cerveadacommented, Sep 17, 2019

This is implemented in version 3.0.0. It should be availabel at maven repository in few hours. 🙂

Read more comments on GitHub >

github_iconTop Results From Across the Web

Integrating Spark Structured Streaming with the Confluent ...
I applied org.apache.spark.sql.avro.SchemaConverters to convert avro schema format to spark StructType, so that you could use it in from_json column ...
Read more >
Pyspark 2.4.0, read avro from kafka with read stream - Python ...
This program reads Avro message from Kafka topic "avro_topics", decodes it and finally streams to console. ,The spark-avro external module can provide this ......
Read more >
将Spark 结构化流与Confluent Schema Registry 集成
Integrating Spark Structured Streaming with the Confluent Schema Registry ... into strings (Avro fields become JSON) import org.apache.spark.sql.functions.
Read more >
Integrating Spark Structured Streaming with the...anycodings
I'm using a Kafka Source in Spark programming Structured Streaming to receive Learning Confluent encoded Avro records. I intend Earhost to use ......
Read more >
Integração do Spark Structured Streaming com o Confluent ...
7 answers ; import io.confluent.kafka.serializers.KafkaAvroDeserializer import ; import org.apache.spark.sql.DataFrame import ; import org.apache.spark.sql.avro.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found