question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow SchemaRegistryClient to be picklable

See original GitHub issue

The problem

I’m using python-schema-registry-client to deserialise Avro messages received from a Kafka topic into an Apache Spark cluster. The Spark jobs are written with PySpark, which relies on objects being picklable for transport from the Spark driver to Spark executors.

The result is that the deserialised SchemaRegistryClient object is missing some properties that belonged to the original client (see test output, below).

(Aside from this Spark use case, I can imagine there are other advantages to making SchemaRegistryClient picklable.)

The desired solution

SchemaRegistryClient is picklable. When unpickled, the deserialised client should behave the same as the original object, and retain all its data members.

Additional notes

The cause is likely that SchemaRegistryClient is using (well, subclassing) requests.Session, which does not serialise well?

I’ve added a demonstrative regression test in this branch. The assertion

assert set(dir(client)) == set(dir(unpickled_client))

results in

E       AssertionError: assert {'__attrs__',...__doc__', ...} == {'__attrs__', ...__doc__', ...}
E         Extra items in the left set:
E         'extra_headers'
E         'subject_to_schema_ids'
E         'subject_to_schema_versions'
E         'url_manager'
E         'url'
E         'id_to_schema'
E         Extra items in the right set:
E         'prefetch'

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
mattjwcommented, May 11, 2020

@marcosschroh I haven’t had a chance to verify this (and possibly wont for a while), but wanted to say thanks for following this up! 🙇

1reaction
mattjwcommented, Jul 28, 2019

Hi @mattjw

I think the second options is faster and easy to implement. But if you create an instance and the first request is made meaning that a Session object was created and then you want to pickle the SchemaRegistryClient you will have the same problem, correct?

Yes that’s true. It means picklability would depend on the state of the object. So not a complete fix.

I have released a new https://github.com/marcosschroh/async-python-schema-registry-client which is the same project as this but async. The Async Client has something similar that you said, except the session is created when iniit is called.

Interesting! Thanks for open sourcing that. I will take a look!

Read more comments on GitHub >

github_iconTop Results From Across the Web

python-schema-registry-client Changelog - pyup.io
1.7.2. Fixed - Checks if Schema is already registered before trying to register. This allows. Schema Registry to be readonly in production environment,...
Read more >
What does it mean for an object to be picklable (or pickle-able)?
It simply means it can be serialized by the pickle module. For a basic explanation of this, see What can be pickled and...
Read more >
Python Schema Registry Client - Marcos Schroh
Python Rest Client to interact against schema-registry confluent server to manage Avro and JSON schemas resources. Requirements. python 3.6+. Installation. pip ...
Read more >
pickle — Python object serialization — Python 3.11.1 ...
The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python ...
Read more >
python-schema-registry-client - PyPI
Python Rest Client to interact against Schema Registry Confluent Server to manage Avro Schemas.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found