ENH: Support out-of-band pickling (protocol 5)
See original GitHub issueIs your feature request related to a problem?
It would be nice if Pandas objects supported pickle’s protocol 5 for out-of-band serialization. This would allow the underlying data to be captured in PickleBuffer
s (specialized memoryview
). For libraries using pickle’s protocol 5 to transmit data over the wire, this would allow for zero-copy data transmission.
Describe the solution you’d like
Pandas objects implement __reduce_ex__
and if the protocol
argument is 5
or greater, they construct PickleBuffer
s out of any data arguments.
API breaking implications
NA as it should be possible to fallback to existing behavior for older pickle protocols. Users have to actively opt-in at a higher level API (through pickle) to see any effect.
Describe alternatives you’ve considered
NA
Additional context
This would be useful in libraries that support distributed dataframes 😉
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
PEP 574 – Pickle protocol 5 with out-of-band data
Producer API. We are introducing a new type pickle.PickleBuffer which can be instantiated from any buffer-supporting object, and is specifically meant ...
Read more >pickle — Python object serialization | Docs4dev
It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5...
Read more >Python 3.7 Error: Unsupported Pickle Protocol 5 - Stack Overflow
When I ran it again within Spyder, it dropped this error. Python Error: Unsupported Pickle Protocol 5. To resolve this, within Spyder I...
Read more >Pickle protocol in savez set to 3 for force zip64 flag savez was not ...
NumPy User Guide, Release 1.22.015.30.5 ContributorsA total of 10 people contributed to this release. ... CakeWithSteak• Charles Harris• Chris Burr• Eric Wieser• ...
Read more >Stop persisting pandas data frames in CSVs
It allows the python code to implement any kind of enhancement, like the latest protocol 5 described in PEP574 pickling out-of-band data ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks, looks interesting.
At a glance, it looks like we’re successfully using pickle5 protocol when pickling underlying ndarrays.
So the primary work to do here are
One other observation is if a column is represented with many small NumPy arrays, this will be true of the pickled form as well. During unpickling would Pandas keep the small NumPy arrays or would it consolidate them into a single one?