Similar to 'from_json(Column)' SparkSQL UDF, implement 'from_confluent_avro(Column)'
See original GitHub issueFrom the earlier issues, I saw that both keys and values should be able to be deserialized as Avro (I think this was the initial design, or only the values are extracted? Didn’t dig too deep there).
Then there was discussion about keys are not Avro, and the values are Avro, which lead to funky classes like ConfluentKafkaAvroWriterWithPlainKey
As per discussion in #6 , I mentioned, what if I have a “plain value” and an avro key? What if my keys or values are some other datatype, not just strings and the other field is Avro?
Rather than create one-off methods for each combination of supported Avro data-types, I feel like a better way to implement encoders/decoders would be to create individual UDF-type functions such as the existing from_json(Column, Map[String, String])
to to_json(Column, Map[String, String])
functions in Spark, where for Avro support, those option maps would include at least the schema registry url.
From a usability perspective, I would expect this to work if I was doing “word count” on a topic with an Avro key.
df.select(from_confluent_avro(col("key")), col("value").cast("int"))
Issue Analytics
- State:
- Created 5 years ago
- Comments:5
Top GitHub Comments
Happy to see #48 😁
This is implemented in version 3.0.0. It should be availabel at maven repository in few hours. 🙂