question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Read One Large Numpy Array

See original GitHub issue

Hey there, how would you recommend reading one large numpy array that does not fit into memory? I was thinking of just running:

arr=np.load(x,mmap_mode='r')
# pseudo code below
new_arr=da.concatenate([da.concatenate([arr[chunk_x,chunk_y] for chunk_x in x_chunks]) for chunk_y in y_chunks])

The chunk parts are just intervals that I’ve omitted and replaced with pseudocode to illustrate my point. Could this be done? Is there a feature to directly import numpy array? I’m not sure how the numpy stacking helps, especially if I want to use one file??

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
TomAugspurgercommented, May 30, 2019

Can you give a full example?

1reaction
mrocklincommented, May 23, 2019

It sounds like you might just want da.from_array

On Thu, May 23, 2019 at 12:03 PM Tom Augspurger notifications@github.com wrote:

Can you rechunk the memmapped array with da.rechunk? I’m not sure what performance / memory usage will be like in that case.

dask_arr = da.from_array(arr).rechunk(…)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/4837?email_source=notifications&email_token=AACKZTGIXRQDF4JMKWFVSXDPW3E5BA5CNFSM4HPIITM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWC3OXI#issuecomment-495302493, or mute the thread https://github.com/notifications/unsubscribe-auth/AACKZTAA5CHCY3LGGK3MBUDPW3E5BANCNFSM4HPIITMQ .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Efficient way to partially read large numpy file?
use numpy.load as normal, but be sure to specify the mmap_mode keyword so that the array is kept on disk, and only necessary...
Read more >
Reading and writing files — NumPy v1.25.dev0 Manual
Write or read large arrays​​ Arrays too large to fit in memory can be treated like ordinary in-memory arrays using memory mapping. Memory...
Read more >
4.8. Processing large NumPy arrays with memory mapping
Memory mapping lets you work with huge arrays almost as if they were regular arrays. Python code that accepts a NumPy array as...
Read more >
Sharing big NumPy arrays across python processes - Luis Sena
We'll see how to use NumPy with different multiprocessing options and benchmark each one of them, using ~1.5 GB array with random values....
Read more >
Processing large NumPy arrays with memory mapping
Python code that accepts a NumPy array as input will also accept a memmap array. However, we need to ensure that the array...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found