Allow SchemaRegistryClient to be picklable
See original GitHub issueThe problem
I’m using python-schema-registry-client
to deserialise Avro messages received from a Kafka topic into an Apache Spark cluster. The Spark jobs are written with PySpark, which relies on objects being picklable for transport from the Spark driver to Spark executors.
The result is that the deserialised SchemaRegistryClient
object is missing some properties that belonged to the original client (see test output, below).
(Aside from this Spark use case, I can imagine there are other advantages to making SchemaRegistryClient
picklable.)
The desired solution
SchemaRegistryClient
is picklable. When unpickled, the deserialised client should behave the same as the original object, and retain all its data members.
Additional notes
The cause is likely that SchemaRegistryClient
is using (well, subclassing) requests.Session
, which does not serialise well?
I’ve added a demonstrative regression test in this branch. The assertion
assert set(dir(client)) == set(dir(unpickled_client))
results in
E AssertionError: assert {'__attrs__',...__doc__', ...} == {'__attrs__', ...__doc__', ...}
E Extra items in the left set:
E 'extra_headers'
E 'subject_to_schema_ids'
E 'subject_to_schema_versions'
E 'url_manager'
E 'url'
E 'id_to_schema'
E Extra items in the right set:
E 'prefetch'
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (9 by maintainers)
Top GitHub Comments
@marcosschroh I haven’t had a chance to verify this (and possibly wont for a while), but wanted to say thanks for following this up! 🙇
Yes that’s true. It means picklability would depend on the state of the object. So not a complete fix.
Interesting! Thanks for open sourcing that. I will take a look!