question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: Support out-of-band pickling (protocol 5)

See original GitHub issue

Is your feature request related to a problem?

It would be nice if Pandas objects supported pickle’s protocol 5 for out-of-band serialization. This would allow the underlying data to be captured in PickleBuffers (specialized memoryview). For libraries using pickle’s protocol 5 to transmit data over the wire, this would allow for zero-copy data transmission.

Describe the solution you’d like

Pandas objects implement __reduce_ex__ and if the protocol argument is 5 or greater, they construct PickleBuffers out of any data arguments.

API breaking implications

NA as it should be possible to fallback to existing behavior for older pickle protocols. Users have to actively opt-in at a higher level API (through pickle) to see any effect.

Describe alternatives you’ve considered

NA

Additional context

This would be useful in libraries that support distributed dataframes 😉

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
TomAugspurgercommented, May 19, 2020

Thanks, looks interesting.

At a glance, it looks like we’re successfully using pickle5 protocol when pickling underlying ndarrays.

import pandas as pd
import numpy as np
import pickle
import pickletools

a = np.arange(4)
b = pd.Series(a)

pickletools.dis(pickletools.optimize(pickle.dumps(a, protocol=5)))

pickletools.dis(pickletools.optimize(pickle.dumps(b, protocol=5)))

So the primary work to do here are

  1. Ensure that that’s actually correct, including for DataFrame?
  2. Check Series / DataFrame for large objects that could also support out-of-band pickling?
0reactions
jakirkhamcommented, May 28, 2020

One other observation is if a column is represented with many small NumPy arrays, this will be true of the pickled form as well. During unpickling would Pandas keep the small NumPy arrays or would it consolidate them into a single one?

Read more comments on GitHub >

github_iconTop Results From Across the Web

PEP 574 – Pickle protocol 5 with out-of-band data
Producer API. We are introducing a new type pickle.PickleBuffer which can be instantiated from any buffer-supporting object, and is specifically meant ...
Read more >
pickle — Python object serialization | Docs4dev
It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5...
Read more >
Python 3.7 Error: Unsupported Pickle Protocol 5 - Stack Overflow
When I ran it again within Spyder, it dropped this error. Python Error: Unsupported Pickle Protocol 5. To resolve this, within Spyder I...
Read more >
Pickle protocol in savez set to 3 for force zip64 flag savez was not ...
NumPy User Guide, Release 1.22.015.30.5 ContributorsA total of 10 people contributed to this release. ... CakeWithSteak• Charles Harris• Chris Burr• Eric Wieser• ...
Read more >
Stop persisting pandas data frames in CSVs
It allows the python code to implement any kind of enhancement, like the latest protocol 5 described in PEP574 pickling out-of-band data ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found