slow memory retrieval (significantly slower then simple pickle)
See original GitHub issueHi,
I’m little confused by why does reading and writing to (file based) “memory” take so enormous amount of time compared to bare pickling/unpickling.
In my case, func() is a tiny memorized function that takes a short string argument and returns a (short) dict with (long) lists of ~complex objects. For some reason, function retrieval from cache takes significantly more time then just unpickleing file. Resulting file is approximately 70Mb.
I observe same thing for any other function.
%prun func(some_str)
1 12.436 12.436 52.011 52.011 pickle.py:1014(load)
41531482 7.665 0.000 11.931 0.000 pickle.py:226(read)
1922386 5.547 0.000 7.339 0.000 pickle.py:1504(load_build)
41531483 4.266 0.000 4.266 0.000 {method 'read' of '_io.BufferedReader' objects}
6490284 3.753 0.000 6.666 0.000 pickle.py:1439(load_long_binput)
2645763 2.666 0.000 4.764 0.000 pickle.py:1192(load_binunicode)
30070039 2.403 0.000 2.403 0.000 {built-in method builtins.isinstance}
4140172 1.870 0.000 3.225 0.000 pickle.py:1415(load_binget)
1922386 1.369 0.000 2.049 0.000 pickle.py:1316(load_newobj)
9196954 1.359 0.000 1.359 0.000 {built-in method _struct.unpack}
1922386 1.114 0.000 8.724 0.000 numpy_pickle.py:319(load_build)
10857316 0.962 0.000 0.962 0.000 {method 'pop' of 'list' objects}
14536246 0.873 0.000 0.873 0.000 {method 'append' of 'list' objects}
1922386 0.873 0.000 1.218 0.000 pickle.py:1472(load_setitem)
1922393 0.816 0.000 0.816 0.000 {built-in method builtins.getattr}
676815 0.765 0.000 1.384 0.000 pickle.py:1458(load_appends)
1922387 0.730 0.000 0.832 0.000 pickle.py:1257(load_empty_dictionary)
1 0.715 0.715 53.099 53.099 <string>:1(<module>)
1245385 0.559 0.000 0.848 0.000 pickle.py:1451(load_append)
...
%prun len(pickle.load(open("..file..", 'rb')))
1 4.587 4.587 4.587 4.587 {built-in method _pickle.load}
1 0.553 0.553 5.140 5.140 <string>:1(<module>)
1 0.000 0.000 5.140 5.140 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method io.open}
1 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Issue Analytics
- State:
- Created 7 years ago
- Comments:36 (29 by maintainers)
Top Results From Across the Web
Why pickle eat memory? - python - Stack Overflow
Why does Pickle consume so much more memory? The reason is that HDF is a binary data pipe, while Pickle is an object...
Read more >Advanced Pandas: Optimize speed and memory - Medium
When retrieving a single value, using .at[] is faster than using .loc[] ... then converting to numpy arrays and lastly by using some...
Read more >What are the differences between long-term, short-term ... - NCBI
In the recent literature there has been considerable confusion about the three types of memory: long-term, short-term, and working memory.
Read more >Python mmap: Improved File I/O With Memory Mapping
In this tutorial, you'll learn how to use Python's mmap module to improve your code's performance when you're working with files. You'll get...
Read more >Tutorial — zarr 2.13.3 documentation - Read the Docs
If you are already familiar with HDF5 then Zarr arrays provide similar ... can be significantly slower than retrieving data from a local...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I think I got the following patch to memory.py to work:
Of course its a huge hack that just bypasses everything. I wonder if it breaks anything.
Actually thinking about it, maybe the cleanest thing to do is to add a
use_joblib_pickling(for lack of a better name) argument toMemory, which should be True by default.