Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Similar to 'from_json(Column)' SparkSQL UDF, implement 'from_confluent_avro(Column)'

See original GitHub issue

From the earlier issues, I saw that both keys and values should be able to be deserialized as Avro (I think this was the initial design, or only the values are extracted? Didn’t dig too deep there).

Then there was discussion about keys are not Avro, and the values are Avro, which lead to funky classes like ConfluentKafkaAvroWriterWithPlainKey

As per discussion in #6 , I mentioned, what if I have a “plain value” and an avro key? What if my keys or values are some other datatype, not just strings and the other field is Avro?

Rather than create one-off methods for each combination of supported Avro data-types, I feel like a better way to implement encoders/decoders would be to create individual UDF-type functions such as the existing from_json(Column, Map[String, String]) to to_json(Column, Map[String, String]) functions in Spark, where for Avro support, those option maps would include at least the schema registry url.

From a usability perspective, I would expect this to work if I was doing “word count” on a topic with an Avro key.

df.select(from_confluent_avro(col("key")), col("value").cast("int"))

Issue Analytics

State:
Created 5 years ago
Comments:5

Top GitHub Comments

2reactions

OneCricketeercommented, Sep 16, 2019

Happy to see #48 😁

1reaction

cerveadacommented, Sep 17, 2019

This is implemented in version 3.0.0. It should be availabel at maven repository in few hours. 🙂

Top Results From Across the Web

Integrating Spark Structured Streaming with the Confluent ...

I applied org.apache.spark.sql.avro.SchemaConverters to convert avro schema format to spark StructType, so that you could use it in from_json column ...

Pyspark 2.4.0, read avro from kafka with read stream - Python ...

This program reads Avro message from Kafka topic "avro_topics", decodes it and finally streams to console. ,The spark-avro external module can provide this ......

将Spark 结构化流与Confluent Schema Registry 集成

Integrating Spark Structured Streaming with the Confluent Schema Registry ... into strings (Avro fields become JSON) import org.apache.spark.sql.functions.

Integrating Spark Structured Streaming with the...anycodings

I'm using a Kafka Source in Spark programming Structured Streaming to receive Learning Confluent encoded Avro records. I intend Earhost to use ......

Integração do Spark Structured Streaming com o Confluent ...

7 answers ; import io.confluent.kafka.serializers.KafkaAvroDeserializer import ; import org.apache.spark.sql.DataFrame import ; import org.apache.spark.sql.avro.