Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Caching Schema from SchemaRegistry using ABRIS 3.2.1

See original GitHub issue

I am using ABRIS 3.2.1 library with our Pyspark application code to extract schema and register schema in Schema Registry for AVRO messages. We use Confluent Kafka using Spark 2.4.7 - Spark Structured Streaming . There is lot of IO network from our Pyspark application to SchemaRegistry Server for every microbatch. Reading to through the post https://github.com/AbsaOSS/ABRiS/issues/105 it is said the issue has been resolved in ABRIS 3.2.0 version, however we have seen the same behaviour in the current 3.2.1 version. Can you pls assist if you have seen this issue or if I am missing any piece of configuration in my code to cache the schema. Your suggestion / advice is highly appreciable.

Below is the snippet of the code for from_confluent_avro method that I am using

` get_df = df.select(ReadWriteFromAvro.from_avro(‘value’, raw_topic).alias(df_alias)

def from_avro(col, topic): jvm_gateway = SparkContext._active_spark_context._gateway.jvm abris_avro = jvm_gateway.za.co.absa.abris.avro naming_strategy = getattr( getattr( abris_avro.read.confluent.SchemaManager, ‘SchemaStorageNamingStrategies$’, ), ‘MODULE$’, ).TOPIC_NAME()

      schema_registry_config_dict = {
        'schema.registry.url': SCHEMA_REGISTRY_URL,
        'basic.auth.credentials.source': 'USER_INFO',
        'basic.auth.user.info':
            KAFKA_SCHEMA_REGISTRY_API_KEY + ':' + KAFKA_SCHEMA_REGISTRY_API_SECRET,
        'schema.registry.topic': topic,
        f'{col}.schema.id': 'latest',
        f'{col}.schema.naming.strategy': naming_strategy,
    }

    conf_map = getattr(
        getattr(
            jvm_gateway.scala.collection.immutable.Map,
            'EmptyMap$',
        ), 'MODULE$',
    )
    for k, v in schema_registry_config_dict.items():
        conf_map = getattr(conf_map, '$plus')(
            jvm_gateway.scala.Tuple2(k, v),
        )

    return Column(abris_avro.functions.from_confluent_avro(_to_java_column(col), conf_map))`

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

2reactions

vphutanecommented, Apr 26, 2021

Yes…that is right. version4.2.0 does the trick. Thank you for such wonderful work…truly appreciate you guys.

0reactions

cerveadacommented, Apr 26, 2021

So if I understand it right, the version 4.2.0 solves the problem for you?

Top Results From Across the Web

Schema Registry Security Overview - Confluent Documentation

Schema Registry consumes from the _schemas log in a background thread, and updates its local caches on consumption of each new _schemas message...

Connecting Apache Spark to Apache Kafka Schema Registry ...

Connecting Apache Spark to Apache Kafka Schema Registry with ABRiS ... Kafka Schema Registry " https://www.waitingforcode.com/apache.

abris - Scaladex

Pain free Spark/Avro integration. Seamlessly integrate with Confluent platform, including Schema Registry with all available naming strategies and schema ...

10. Schema Evolution Support - Spring

A schema registry lets you store schema information in a textual format ... The default converter is optimized to cache not only the...

Spark 3.2.0 Structured Streaming save data to Kafka with ...

Is there some easy way how to save a spark structured streaming dataframe into kafka with Confluent Schema registry? Spark version is 3.2.0, ......