question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Caching Schema from SchemaRegistry using ABRIS 3.2.1

See original GitHub issue

I am using ABRIS 3.2.1 library with our Pyspark application code to extract schema and register schema in Schema Registry for AVRO messages. We use Confluent Kafka using Spark 2.4.7 - Spark Structured Streaming . There is lot of IO network from our Pyspark application to SchemaRegistry Server for every microbatch. Reading to through the post https://github.com/AbsaOSS/ABRiS/issues/105 it is said the issue has been resolved in ABRIS 3.2.0 version, however we have seen the same behaviour in the current 3.2.1 version. Can you pls assist if you have seen this issue or if I am missing any piece of configuration in my code to cache the schema. Your suggestion / advice is highly appreciable.

Below is the snippet of the code for from_confluent_avro method that I am using

` get_df = df.select(ReadWriteFromAvro.from_avro(‘value’, raw_topic).alias(df_alias)

def from_avro(col, topic): jvm_gateway = SparkContext._active_spark_context._gateway.jvm abris_avro = jvm_gateway.za.co.absa.abris.avro naming_strategy = getattr( getattr( abris_avro.read.confluent.SchemaManager, ‘SchemaStorageNamingStrategies$’, ), ‘MODULE$’, ).TOPIC_NAME()

      schema_registry_config_dict = {
        'schema.registry.url': SCHEMA_REGISTRY_URL,
        'basic.auth.credentials.source': 'USER_INFO',
        'basic.auth.user.info':
            KAFKA_SCHEMA_REGISTRY_API_KEY + ':' + KAFKA_SCHEMA_REGISTRY_API_SECRET,
        'schema.registry.topic': topic,
        f'{col}.schema.id': 'latest',
        f'{col}.schema.naming.strategy': naming_strategy,
    }

    conf_map = getattr(
        getattr(
            jvm_gateway.scala.collection.immutable.Map,
            'EmptyMap$',
        ), 'MODULE$',
    )
    for k, v in schema_registry_config_dict.items():
        conf_map = getattr(conf_map, '$plus')(
            jvm_gateway.scala.Tuple2(k, v),
        )

    return Column(abris_avro.functions.from_confluent_avro(_to_java_column(col), conf_map))`

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
vphutanecommented, Apr 26, 2021

Yes…that is right. version4.2.0 does the trick. Thank you for such wonderful work…truly appreciate you guys.

0reactions
cerveadacommented, Apr 26, 2021

So if I understand it right, the version 4.2.0 solves the problem for you?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Schema Registry Security Overview - Confluent Documentation
Schema Registry consumes from the _schemas log in a background thread, and updates its local caches on consumption of each new _schemas message...
Read more >
Connecting Apache Spark to Apache Kafka Schema Registry ...
Connecting Apache Spark to Apache Kafka Schema Registry with ABRiS ... Kafka Schema Registry " https://www.waitingforcode.com/apache.
Read more >
abris - Scaladex
Pain free Spark/Avro integration. Seamlessly integrate with Confluent platform, including Schema Registry with all available naming strategies and schema ...
Read more >
10. Schema Evolution Support - Spring
A schema registry lets you store schema information in a textual format ... The default converter is optimized to cache not only the...
Read more >
Spark 3.2.0 Structured Streaming save data to Kafka with ...
Is there some easy way how to save a spark structured streaming dataframe into kafka with Confluent Schema registry? Spark version is 3.2.0, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found