Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`numpy.load` slows `runcell`

See original GitHub issue

demo.mp4

Despite last command printing 0.0, runcell(3, ...) took several seconds. Strangely, shape = (2, ...) & dtype='float64', which is same size in memory, doesn’t yield this effect; haven’t tested with 'float32'. Also, IPython commands aren’t slowed.

Using spyder 4.2.1 as conda doesn’t have 4.2.2 yet; Win 10 x64, Python 3.7.9, numpy 1.19.2.

Code:

import numpy as np

X = np.random.randn(16, 16, 240, 24000).astype('float16')
np.save('arr.npy', X)

Restart kernel

import numpy as np
from time import time
out = np.load('arr.npy')

#%%
t0 = time()
t1 = time()

#%% 
print(t1 - t0)

Issue Analytics

State:
Created 3 years ago
Comments:21 (19 by maintainers)

Top GitHub Comments

1reaction

impact27commented, Mar 15, 2021

Maybe we should add a message saying: “Computing min/max for the current variables took more than 2 seconds. Do you want to disable automatic variable explorer refreshing?”

1reaction

bcolsencommented, Mar 12, 2021

I can confirm that this is due to min max in the variable explorer. X takes 30 seconds to compute on my box and X.max() takes about 13 seconds. It seems that variable explorer is recomputing the min max of every variable in the explorer when it gets the call to refresh after runcell or runfile. If the variable explorer is busy when runcell or runfile is

@OverLordGoldDragon @sawtw thanks for helping get to the bottom of this! Here is a work around: Open variable explorer, right click on the table and uncheck “show arrays min/max” wait for a while for the explorer to refresh and then it should be speedy again.

Debugging

Issue 1: Varible Exporer updates every variable regardless of change

With one array in the variable explorer and min/max enabled, it takes about 26 seconds to refresh the variable explorer. This is regardless of whether the array has changed. Just just running t0 = time() in either IPython directly of through runcell it takes ~26 s for the variable explorer to update.

I don’t know if there is much to be done here. Perhaps we check the array size and don’t give min/max past a certain size.

Issue 2: `Runcell` and `runfile` seem to wait for comms from variable explorer.

The issue for this bug is that runcell gets stuck in comms if variable explorer is already updating. If you use runcell and variable explorer is finished updating, runcell work just fine. I only gets stuck when you are running one cell right after the other. The first cell you run will return normally while the next cell waits until the update in the variable explorer to from the previous cell is finished.

Does the variable explorer need to respond to runcell or is it good enough to just assume it will update.

Top Results From Across the Web

Using Numpy.Load - is this the fastest method? It seems slow

I am loading 10-15MB numpy arrays saved in .npy format in a loop, and each load takes about 1.5s in Google Colab. Is...

Built-in magic commands — IPython 8.7.0 documentation

Load numpy and matplotlib to work interactively. This function lets you activate pylab (matplotlib, numpy and interactive support) at any point during an ......

numpy.load — NumPy v1.24 Manual

Loading files that contain object arrays uses the pickle module, which is not secure against erroneous or maliciously constructed data.

Loading NumPy arrays from disk: mmap() vs. Zarr/HDF5

Learn how to load larger-than-memory NumPy arrays from disk using ... for the much slower write to the disk, you're only writing to...

Is Your Python For-loop Slow? Use NumPy Instead.

NumPy arrays are optimized for speed because of homogeneous and densely packed elements and implementations in C. ... Thanks for reading, friend!