Saving larger than memory output to HDF5
See original GitHub issueHi,
I am trying to process a dataset larger than memory in the following way. data
is an HDF5 dataset.
>> def filter:
>> ...
>>
>> daskdata= da.from_array(data,chunks=(300,400,1000))
>> output = daskdata.map_blocks(filter).compute(get=multiprocessing.get)
My problem now is output
will be larger than memory. How can I avoid dumping output
into memory? And the filter
will return a numpy array.
Thanks
Issue Analytics
- State:
- Created 6 years ago
- Comments:14 (8 by maintainers)
Top Results From Across the Web
How can I work with larger-than-memory hdf5 files
I am using Mathematica 11.0 and am trying to work with large .h5 files. Does anyone know if it's possible to work with...
Read more >Write data larger than memory to HDF5 file in Matlab
I want to save it in the same dataset in an HDF5 file of size 120964x50176. There doesn't seem to be much documentation...
Read more >Achieving High Performance I/O with HDF5 - The HDF Group
If writing to a portion of a dataset in a loop, be sure to close the dataspace with each iteration, as this can...
Read more >Proposal: working with larger than memory data in hdf5 format ...
Proposal Often, I have data, that is larger than memory and I'd like to be able to work with the hdf5 file format...
Read more >Reading and Writing Dask DataFrames and Arrays to HDF5
This blog post explains how to write Dask DataFrames to HDF5 files with ... when the dataset is bigger than the memory of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Might try using
h5pickle
to workaround this.Edit: Please be very careful when writing to HDF5 files. They are not designed to be written to from multiple processes. Would want to ensure they are written to from only one process at a time.
Yes. Consider the use of the
dask.set_options(get=...)
context manager.In the future, it would be good to see usage questions like this on stack overflow under the #dask tag