Allow SchemaRegistryClient to be picklable
See original GitHub issueThe problem
I’m using python-schema-registry-client to deserialise Avro messages received from a Kafka topic into an Apache Spark cluster. The Spark jobs are written with PySpark, which relies on objects being picklable for transport from the Spark driver to Spark executors.
The result is that the deserialised SchemaRegistryClient object is missing some properties that belonged to the original client (see test output, below).
(Aside from this Spark use case, I can imagine there are other advantages to making SchemaRegistryClient picklable.)
The desired solution
SchemaRegistryClient is picklable. When unpickled, the deserialised client should behave the same as the original object, and retain all its data members.
Additional notes
The cause is likely that SchemaRegistryClient is using (well, subclassing) requests.Session, which does not serialise well?
I’ve added a demonstrative regression test in this branch. The assertion
assert set(dir(client)) == set(dir(unpickled_client))
results in
E AssertionError: assert {'__attrs__',...__doc__', ...} == {'__attrs__', ...__doc__', ...}
E Extra items in the left set:
E 'extra_headers'
E 'subject_to_schema_ids'
E 'subject_to_schema_versions'
E 'url_manager'
E 'url'
E 'id_to_schema'
E Extra items in the right set:
E 'prefetch'
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (9 by maintainers)

Top Related StackOverflow Question
@marcosschroh I haven’t had a chance to verify this (and possibly wont for a while), but wanted to say thanks for following this up! 🙇
Yes that’s true. It means picklability would depend on the state of the object. So not a complete fix.
Interesting! Thanks for open sourcing that. I will take a look!