Caching Schema from SchemaRegistry using ABRIS 3.2.1
See original GitHub issueI am using ABRIS 3.2.1 library with our Pyspark application code to extract schema and register schema in Schema Registry for AVRO messages. We use Confluent Kafka using Spark 2.4.7 - Spark Structured Streaming . There is lot of IO network from our Pyspark application to SchemaRegistry Server for every microbatch. Reading to through the post https://github.com/AbsaOSS/ABRiS/issues/105 it is said the issue has been resolved in ABRIS 3.2.0 version, however we have seen the same behaviour in the current 3.2.1 version. Can you pls assist if you have seen this issue or if I am missing any piece of configuration in my code to cache the schema. Your suggestion / advice is highly appreciable.
Below is the snippet of the code for from_confluent_avro method that I am using
` get_df = df.select(ReadWriteFromAvro.from_avro(‘value’, raw_topic).alias(df_alias)
def from_avro(col, topic): jvm_gateway = SparkContext._active_spark_context._gateway.jvm abris_avro = jvm_gateway.za.co.absa.abris.avro naming_strategy = getattr( getattr( abris_avro.read.confluent.SchemaManager, ‘SchemaStorageNamingStrategies$’, ), ‘MODULE$’, ).TOPIC_NAME()
schema_registry_config_dict = {
'schema.registry.url': SCHEMA_REGISTRY_URL,
'basic.auth.credentials.source': 'USER_INFO',
'basic.auth.user.info':
KAFKA_SCHEMA_REGISTRY_API_KEY + ':' + KAFKA_SCHEMA_REGISTRY_API_SECRET,
'schema.registry.topic': topic,
f'{col}.schema.id': 'latest',
f'{col}.schema.naming.strategy': naming_strategy,
}
conf_map = getattr(
getattr(
jvm_gateway.scala.collection.immutable.Map,
'EmptyMap$',
), 'MODULE$',
)
for k, v in schema_registry_config_dict.items():
conf_map = getattr(conf_map, '$plus')(
jvm_gateway.scala.Tuple2(k, v),
)
return Column(abris_avro.functions.from_confluent_avro(_to_java_column(col), conf_map))`
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top GitHub Comments
Yes…that is right. version4.2.0 does the trick. Thank you for such wonderful work…truly appreciate you guys.
So if I understand it right, the version 4.2.0 solves the problem for you?