question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Writing (distributed) Dask array fails

See original GitHub issue

Description

Dask array fails to be written to FITS, at least when using dask.distributed. This doesn’t occur for me if I don’t create a dask.distributed Client. That is, the example given in the 'What’s new in 4.1` works for me.

Expected behavior

As per version 4.1, writing dask arrays is supported.

Actual behavior

The following error occurred:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/distributed/protocol/pickle.py in dumps(x, buffer_callback, protocol)
     48         buffers.clear()
---> 49         result = pickle.dumps(x, **dump_kwargs)
     50         if len(result) < 1000:

TypeError: can't pickle _thread.lock objects

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-44-23f936e3e991> in <module>
      3 from astropy.io import fits
      4 hdu = fits.PrimaryHDU(data=array)
----> 5 hdu.writeto('test_dask.fits', overwrite=True)

~/miniconda3/lib/python3.7/site-packages/astropy/utils/decorators.py in wrapper(*args, **kwargs)
    533                     warnings.warn(message, warning_type, stacklevel=2)
    534 
--> 535             return function(*args, **kwargs)
    536 
    537         return wrapper

~/miniconda3/lib/python3.7/site-packages/astropy/io/fits/hdu/base.py in writeto(self, name, output_verify, overwrite, checksum)
    370         hdulist = HDUList([self])
    371         hdulist.writeto(name, output_verify, overwrite=overwrite,
--> 372                         checksum=checksum)
    373 
    374     @classmethod

~/miniconda3/lib/python3.7/site-packages/astropy/utils/decorators.py in wrapper(*args, **kwargs)
    533                     warnings.warn(message, warning_type, stacklevel=2)
    534 
--> 535             return function(*args, **kwargs)
    536 
    537         return wrapper

~/miniconda3/lib/python3.7/site-packages/astropy/io/fits/hdu/hdulist.py in writeto(self, fileobj, output_verify, overwrite, checksum)
    941             for hdu in self:
    942                 hdu._prewriteto(checksum=checksum)
--> 943                 hdu._writeto(hdulist._file)
    944                 hdu._postwriteto()
    945         hdulist.close(output_verify=output_verify, closed=closed)

~/miniconda3/lib/python3.7/site-packages/astropy/io/fits/hdu/base.py in _writeto(self, fileobj, inplace, copy)
    675 
    676         with _free_space_check(self, dirname):
--> 677             self._writeto_internal(fileobj, inplace, copy)
    678 
    679     def _writeto_internal(self, fileobj, inplace, copy):

~/miniconda3/lib/python3.7/site-packages/astropy/io/fits/hdu/base.py in _writeto_internal(self, fileobj, inplace, copy)
    681         if not inplace or self._new:
    682             header_offset, _ = self._writeheader(fileobj)
--> 683             data_offset, data_size = self._writedata(fileobj)
    684 
    685             # Set the various data location attributes on newly-written HDUs

~/miniconda3/lib/python3.7/site-packages/astropy/io/fits/hdu/base.py in _writedata(self, fileobj)
    613         if self._data_loaded or self._data_needs_rescale:
    614             if self.data is not None:
--> 615                 size += self._writedata_internal(fileobj)
    616             # pad the FITS data block
    617             if size > 0:

~/miniconda3/lib/python3.7/site-packages/astropy/io/fits/hdu/image.py in _writedata_internal(self, fileobj)
    621             return size
    622         elif isinstance(self.data, DaskArray):
--> 623             return self._writeinternal_dask(fileobj)
    624         else:
    625             # Based on the system type, determine the byteorders that

~/miniconda3/lib/python3.7/site-packages/astropy/io/fits/hdu/image.py in _writeinternal_dask(self, fileobj)
    707                                 buffer=outmmap)
    708 
--> 709             output.store(outarr, lock=True, compute=True)
    710         finally:
    711             if should_close:

~/miniconda3/lib/python3.7/site-packages/dask/array/core.py in store(self, target, **kwargs)
   1387     @wraps(store)
   1388     def store(self, target, **kwargs):
-> 1389         r = store([self], [target], **kwargs)
   1390 
   1391         if kwargs.get("return_stored", False):

~/miniconda3/lib/python3.7/site-packages/dask/array/core.py in store(sources, targets, lock, regions, compute, return_stored, **kwargs)
    943 
    944         if compute:
--> 945             result.compute(**kwargs)
    946             return None
    947         else:

~/miniconda3/lib/python3.7/site-packages/dask/base.py in compute(self, **kwargs)
    164         dask.base.compute
    165         """
--> 166         (result,) = compute(self, traverse=False, **kwargs)
    167         return result
    168 

~/miniconda3/lib/python3.7/site-packages/dask/base.py in compute(*args, **kwargs)
    442         postcomputes.append(x.__dask_postcompute__())
    443 
--> 444     results = schedule(dsk, keys, **kwargs)
    445     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    446 

~/miniconda3/lib/python3.7/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   2712             retries=retries,
   2713             user_priority=priority,
-> 2714             actors=actors,
   2715         )
   2716         packed = pack_data(keys, futures)

~/miniconda3/lib/python3.7/site-packages/distributed/client.py in _graph_to_futures(self, dsk, keys, restrictions, loose_restrictions, priority, user_priority, resources, retries, fifo_timeout, actors)
   2639                 {
   2640                     "op": "update-graph",
-> 2641                     "tasks": valmap(dumps_task, dsk),
   2642                     "dependencies": dependencies,
   2643                     "keys": list(map(tokey, keys)),

~/miniconda3/lib/python3.7/site-packages/cytoolz/dicttoolz.pyx in cytoolz.dicttoolz.valmap()

~/miniconda3/lib/python3.7/site-packages/cytoolz/dicttoolz.pyx in cytoolz.dicttoolz.valmap()

~/miniconda3/lib/python3.7/site-packages/distributed/worker.py in dumps_task(task)
   3356             return d
   3357         elif not any(map(_maybe_complex, task[1:])):
-> 3358             return {"function": dumps_function(task[0]), "args": warn_dumps(task[1:])}
   3359     return to_serialize(task)
   3360 

~/miniconda3/lib/python3.7/site-packages/distributed/worker.py in warn_dumps(obj, dumps, limit)
   3365 def warn_dumps(obj, dumps=pickle.dumps, limit=1e6):
   3366     """ Dump an object to bytes, warn if those bytes are large """
-> 3367     b = dumps(obj, protocol=4)
   3368     if not _warn_dumps_warned[0] and len(b) > limit:
   3369         _warn_dumps_warned[0] = True

~/miniconda3/lib/python3.7/site-packages/distributed/protocol/pickle.py in dumps(x, buffer_callback, protocol)
     58         try:
     59             buffers.clear()
---> 60             result = cloudpickle.dumps(x, **dump_kwargs)
     61         except Exception as e:
     62             logger.info("Failed to serialize %s. Exception: %s", x, e)

~/miniconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol)
    100         with io.BytesIO() as file:
    101             cp = CloudPickler(file, protocol=protocol)
--> 102             cp.dump(obj)
    103             return file.getvalue()
    104 

~/miniconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
    561     def dump(self, obj):
    562         try:
--> 563             return Pickler.dump(self, obj)
    564         except RuntimeError as e:
    565             if "recursion" in e.args[0]:

~/miniconda3/lib/python3.7/pickle.py in dump(self, obj)
    435         if self.proto >= 4:
    436             self.framer.start_framing()
--> 437         self.save(obj)
    438         self.write(STOP)
    439         self.framer.end_framing()

~/miniconda3/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

~/miniconda3/lib/python3.7/pickle.py in save_tuple(self, obj)
    787         write(MARK)
    788         for element in obj:
--> 789             save(element)
    790 
    791         if id(obj) in memo:

~/miniconda3/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    522             reduce = getattr(obj, "__reduce_ex__", None)
    523             if reduce is not None:
--> 524                 rv = reduce(self.proto)
    525             else:
    526                 reduce = getattr(obj, "__reduce__", None)

TypeError: can't pickle _thread.lock objects

Steps to Reproduce

Following the ‘What’s New in 4.1’:

from dask.distributed import Client
client = Client()
import dask.array as da
array = da.random.random((1000, 1000))
from astropy.io import fits
hdu = fits.PrimaryHDU(data=array)
hdu.writeto('test_dask.fits', overwrite=True)

System Details

Darwin-19.6.0-x86_64-i386-64bit
Python 3.7.6 | packaged by conda-forge | (default, Jun  1 2020, 18:33:30) 
[Clang 9.0.1 ]
Numpy 1.19.2
astropy 4.2
Scipy 1.3.1
Matplotlib 3.3.1
Dask 2.17.2

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:15 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
joseph-longcommented, Feb 2, 2021

@pllim have a look at the conversation upstream, in particular https://github.com/dask/dask/pull/1881#issuecomment-287196379 , and see if you think that this is something that can be handled upstream.

As it is, astropy.io.fits doesn’t support dask when used with dask.distributed, which IMO is the main reason to use dask at all. I’ve been maintaining my own wrapper classes to work around other serialization-unfriendly features of fits.HDUList and friends.

1reaction
joseph-longcommented, Feb 2, 2021

No, never mind, it seems to be a Dask bug: https://github.com/dask/distributed/issues/780

Read more comments on GitHub >

github_iconTop Results From Across the Web

Best Practices - Dask documentation
A common performance problem among Dask Array users is that they have chosen a chunk size that is either too small (leading to...
Read more >
Dask delayed / dask array no response - Stack Overflow
I have a distributed dask cluster setup and I have used it to load and transform a bunch of data. Works like a...
Read more >
Common Mistakes to Avoid when Using Dask - Coiled
Using Dask for the first time can be a steep learning curve. ... the time to understand the basic principles of distributed computing...
Read more >
Speeding up your Algorithms Part 4— Dask | by Puneet Grover
Introduction; Data Types; Delayed; Distributed; Machine Learning ... Dask Array operates on very large arrays, by dividing them into chunks ...
Read more >
Parallel computing with Dask - Xarray
Once you've manipulated a Dask array, you can still write a dataset too big to ... Note that writing netCDF files with Dask's...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found