Pickle is significantly slower than a memory copy
See original GitHub issueMy machine copies memory at 5GB/s
In [1]: b = b'0' * 1000000000
In [2]: %time len(b[1:])
CPU times: user 139 ms, sys: 63.3 ms, total: 202 ms
Wall time: 202 ms
Out[2]: 999999999
But NumPy arrays only serialize at 2.5 GB/s
In [4]: import numpy as np
In [5]: x = np.random.randint(0, 255, dtype='u1', size=1000000000) # 1GB
In [6]: import pickle
In [7]: %time len(pickle.dumps(x, protocol=-1))
CPU times: user 309 ms, sys: 96.2 ms, total: 405 ms
Wall time: 404 ms
Out[7]: 1000000161
Why the extra time?
Versions
Python 3.4, Linux, NumPy 1.11.0
Issue Analytics
- State:
- Created 7 years ago
- Reactions:3
- Comments:9 (9 by maintainers)
Top Results From Across the Web
Why pickle eat memory? - python - Stack Overflow
Why does Pickle consume so much more memory? The reason is that HDF is a binary data pipe, while Pickle is an object...
Read more >Pickle isn't slow, it's a protocol - Matthew Rocklin
This turned out to be because serializing PyTorch models with pickle was very slow (1 MB/s for GPU based models, 50 MB/s for...
Read more >pickle — Python object serialization — Python 3.11.1 ...
The pickle module can transform a complex object into a byte stream and it can transform the byte stream into an object with...
Read more >Pickle is over 10 times faster than joblib for save and load ...
I compared the time of saving/loading of both libraries, and pickle is over 10 times faster than joblib . In the case of...
Read more >Stop Using CSVs for Storage — Pickle is an 80 Times Faster ...
Pickling doesn't compress data — Pickling an object won't compress it. Sure, the file will be smaller when compared to CSV, but you...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Support for protocol 5 has been merged. Closing. If the copy benchmark analysis leads to another issue, please open a new one.
The answer is that
ndarray.__reduce__
usestostring()
internally (making a copy) and thenpickle.dumps
makes an additional copy of any data is receives from__reduce__
(writing it into aio.BytesIO
, of course).Compare:
It might be possible to do the pickling without an additional copy, but as far as I can tell based on the current design of pickle, that would require converting numpy arrays into
bytes
or another builtin Python type supported by pickle without a copy (you can’t pickle memory views). Unfortunately, as @teoliphant explains, converting numpy arrays into strings without a copy isn’t possible.So I guess you could either try to get first class support for memoryview objects into pickle (maybe not a bad idea) or roll your own serialization format.