Computing numpy array from dask array taking very long time.
See original GitHub issueI am using MPII Human Pose Dataset having around 25k images of resolution 1280x720 in the form of dask array. So I’m converting it to a numpy array for further processing purposes. I’ve tried two approaches:
- To process a single image it is taking around 3 min. Below is the code I’m using.
import hub
ds = hub.load("username/MPII_Human_Pose_Dataset")
ds['image'][1].compute()
- Below approach takes 1 min for a single image.
import hub
import numpy as np
ds = hub.load("username/MPII_Human_Pose_Dataset")
ds = pd.DataFrame.from_dict(ds)
ds['image'][1].compute()
or you can use np.asarray, it takes around 7 secs more than above approach.
np.asarray(ds['image'][1])
Is there any way for reducing this much processing time? I only uploaded that dataset on the hub so can you check if the issue is from my side(in my #168 PR) or your side. (I’m using colab)
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
converting dask array to numpy array taking long time #3870
Hi, I tried to convert dask array into numpy array using the following command x1=np.asarray(x) but it is taking long time .
Read more >Conversion from (xarray) dask.array to numpy array is very slow
BUT when I try to get actual values in numpy array format (see below) it takes up to 2 mins to convert. I...
Read more >Create Dask Arrays - Dask documentation
Using dask. We can still construct Dask arrays around this data if we have a Python function that can generate pieces of the...
Read more >Dask Arrays — How to Parallelize Numpy With Ease
Dataset size gets larger than available RAM; Computation time gets painfully long. The first reason is tightly connected with the second, for ......
Read more >5.11. Performing out-of-core computations on large arrays with ...
NumPy used 763 MB to allocate the entire array, and the entire process (allocation and computation) took more than 4 seconds. NumPy wasted...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yup, as we discussed, chunk size of 1000 for high-resolution images was causing chunks of huge sizes to be created, causing the slowdown. Chunk size of 5 should fix this.
Thanks for addressing the issue, everything is working fine now.