`numpy.load` slows `runcell`
See original GitHub issueDespite last command printing 0.0
, runcell(3, ...)
took several seconds. Strangely, shape = (2, ...)
& dtype='float64'
, which is same size in memory, doesn’t yield this effect; haven’t tested with 'float32'
. Also, IPython commands aren’t slowed.
Using spyder 4.2.1 as conda doesn’t have 4.2.2 yet; Win 10 x64, Python 3.7.9, numpy 1.19.2.
Code:
import numpy as np
X = np.random.randn(16, 16, 240, 24000).astype('float16')
np.save('arr.npy', X)
Restart kernel
import numpy as np
from time import time
out = np.load('arr.npy')
#%%
t0 = time()
t1 = time()
#%%
print(t1 - t0)
Issue Analytics
- State:
- Created 3 years ago
- Comments:21 (19 by maintainers)
Top Results From Across the Web
Using Numpy.Load - is this the fastest method? It seems slow
I am loading 10-15MB numpy arrays saved in .npy format in a loop, and each load takes about 1.5s in Google Colab. Is...
Read more >Built-in magic commands — IPython 8.7.0 documentation
Load numpy and matplotlib to work interactively. This function lets you activate pylab (matplotlib, numpy and interactive support) at any point during an ......
Read more >numpy.load — NumPy v1.24 Manual
Loading files that contain object arrays uses the pickle module, which is not secure against erroneous or maliciously constructed data.
Read more >Loading NumPy arrays from disk: mmap() vs. Zarr/HDF5
Learn how to load larger-than-memory NumPy arrays from disk using ... for the much slower write to the disk, you're only writing to...
Read more >Is Your Python For-loop Slow? Use NumPy Instead.
NumPy arrays are optimized for speed because of homogeneous and densely packed elements and implementations in C. ... Thanks for reading, friend!
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Maybe we should add a message saying: “Computing min/max for the current variables took more than 2 seconds. Do you want to disable automatic variable explorer refreshing?”
I can confirm that this is due to min max in the variable explorer. X takes 30 seconds to compute on my box and X.max() takes about 13 seconds. It seems that variable explorer is recomputing the min max of every variable in the explorer when it gets the call to refresh after
runcell
orrunfile
. If the variable explorer is busy whenruncell
orrunfile
is@OverLordGoldDragon @sawtw thanks for helping get to the bottom of this! Here is a work around: Open variable explorer, right click on the table and uncheck “show arrays min/max” wait for a while for the explorer to refresh and then it should be speedy again.
Debugging
Issue 1: Varible Exporer updates every variable regardless of change
With one array in the variable explorer and min/max enabled, it takes about 26 seconds to refresh the variable explorer. This is regardless of whether the array has changed. Just just running
t0 = time()
in either IPython directly of throughruncell
it takes ~26 s for the variable explorer to update.I don’t know if there is much to be done here. Perhaps we check the array size and don’t give min/max past a certain size.
Issue 2:
Runcell
andrunfile
seem to wait for comms from variable explorer.The issue for this bug is that
runcell
gets stuck in comms if variable explorer is already updating. If you useruncell
and variable explorer is finished updating,runcell
work just fine. I only gets stuck when you are running one cell right after the other. The first cell you run will return normally while the next cell waits until the update in the variable explorer to from the previous cell is finished.Does the variable explorer need to respond to
runcell
or is it good enough to just assume it will update.