question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Computing numpy array from dask array taking very long time.

See original GitHub issue

I am using MPII Human Pose Dataset having around 25k images of resolution 1280x720 in the form of dask array. So I’m converting it to a numpy array for further processing purposes. I’ve tried two approaches:

  1. To process a single image it is taking around 3 min. Below is the code I’m using.

import hub ds = hub.load("username/MPII_Human_Pose_Dataset") ds['image'][1].compute()

  1. Below approach takes 1 min for a single image.

import hub import numpy as np ds = hub.load("username/MPII_Human_Pose_Dataset") ds = pd.DataFrame.from_dict(ds) ds['image'][1].compute() or you can use np.asarray, it takes around 7 secs more than above approach. np.asarray(ds['image'][1])

Is there any way for reducing this much processing time? I only uploaded that dataset on the hub so can you check if the issue is from my side(in my #168 PR) or your side. (I’m using colab)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
AbhinavTulicommented, Nov 3, 2020

Yup, as we discussed, chunk size of 1000 for high-resolution images was causing chunks of huge sizes to be created, causing the slowdown. Chunk size of 5 should fix this.

0reactions
sanchitvjcommented, Nov 5, 2020

Thanks for addressing the issue, everything is working fine now.

Read more comments on GitHub >

github_iconTop Results From Across the Web

converting dask array to numpy array taking long time #3870
Hi, I tried to convert dask array into numpy array using the following command x1=np.asarray(x) but it is taking long time .
Read more >
Conversion from (xarray) dask.array to numpy array is very slow
BUT when I try to get actual values in numpy array format (see below) it takes up to 2 mins to convert. I...
Read more >
Create Dask Arrays - Dask documentation
Using dask.​​ We can still construct Dask arrays around this data if we have a Python function that can generate pieces of the...
Read more >
Dask Arrays — How to Parallelize Numpy With Ease
Dataset size gets larger than available RAM; Computation time gets painfully long. The first reason is tightly connected with the second, for ......
Read more >
5.11. Performing out-of-core computations on large arrays with ...
NumPy used 763 MB to allocate the entire array, and the entire process (allocation and computation) took more than 4 seconds. NumPy wasted...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found