question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pickle is significantly slower than a memory copy

See original GitHub issue

My machine copies memory at 5GB/s

In [1]: b = b'0' * 1000000000

In [2]: %time len(b[1:])
CPU times: user 139 ms, sys: 63.3 ms, total: 202 ms
Wall time: 202 ms
Out[2]: 999999999

But NumPy arrays only serialize at 2.5 GB/s

In [4]: import numpy as np

In [5]: x = np.random.randint(0, 255, dtype='u1', size=1000000000)  # 1GB

In [6]: import pickle

In [7]: %time len(pickle.dumps(x, protocol=-1))
CPU times: user 309 ms, sys: 96.2 ms, total: 405 ms
Wall time: 404 ms
Out[7]: 1000000161

Why the extra time?

Versions

Python 3.4, Linux, NumPy 1.11.0

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:3
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

6reactions
mattipcommented, May 7, 2019

Support for protocol 5 has been merged. Closing. If the copy benchmark analysis leads to another issue, please open a new one.

3reactions
shoyercommented, Apr 14, 2016

The answer is that ndarray.__reduce__ uses tostring() internally (making a copy) and then pickle.dumps makes an additional copy of any data is receives from __reduce__ (writing it into a io.BytesIO, of course).

Compare:

In [22]: %time _ = x.copy()
CPU times: user 286 ms, sys: 324 ms, total: 609 ms
Wall time: 609 ms

In [23]: %time _ = x.__reduce__()
CPU times: user 296 ms, sys: 320 ms, total: 616 ms
Wall time: 615 ms

In [24]: %time _ = pickle.dumps(x, protocol=-1)
CPU times: user 606 ms, sys: 682 ms, total: 1.29 s
Wall time: 1.29 s

It might be possible to do the pickling without an additional copy, but as far as I can tell based on the current design of pickle, that would require converting numpy arrays into bytes or another builtin Python type supported by pickle without a copy (you can’t pickle memory views). Unfortunately, as @teoliphant explains, converting numpy arrays into strings without a copy isn’t possible.

So I guess you could either try to get first class support for memoryview objects into pickle (maybe not a bad idea) or roll your own serialization format.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why pickle eat memory? - python - Stack Overflow
Why does Pickle consume so much more memory? The reason is that HDF is a binary data pipe, while Pickle is an object...
Read more >
Pickle isn't slow, it's a protocol - Matthew Rocklin
This turned out to be because serializing PyTorch models with pickle was very slow (1 MB/s for GPU based models, 50 MB/s for...
Read more >
pickle — Python object serialization — Python 3.11.1 ...
The pickle module can transform a complex object into a byte stream and it can transform the byte stream into an object with...
Read more >
Pickle is over 10 times faster than joblib for save and load ...
I compared the time of saving/loading of both libraries, and pickle is over 10 times faster than joblib . In the case of...
Read more >
Stop Using CSVs for Storage — Pickle is an 80 Times Faster ...
Pickling doesn't compress data — Pickling an object won't compress it. Sure, the file will be smaller when compared to CSV, but you...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found