question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

xarray serialization error when using observed custom distribution

See original GitHub issue

Describe the bug Unable to save traces to file, which is essential for running code on a cluster.

To Reproduce Follow the case study Fitting TESS data (https://gallery.exoplanet.codes/tutorials/tess/) – except using lc = lc_file.remove_nans().remove_outliers().normalize() instead of lc = lc_file.remove_nans().normalize().remove_outliers(), as the first order of transformations raised an unrelated error in my case. After sampling, try to save the trace as one commonly saves arviz.InferenceData objects: trace.to_netcdf('results'). This will raise the following error

ValueError                                Traceback (most recent call last)
<ipython-input-23-c0e5828e59ee> in <module>
----> 1 trace.to_netcdf('results')

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/arviz/data/inference_data.py in to_netcdf(self, filename, compress, groups)
    390                 if compress:
    391                     kwargs["encoding"] = {var_name: {"zlib": True} for var_name in data.variables}
--> 392                 data.to_netcdf(filename, mode=mode, group=group, **kwargs)
    393                 data.close()
    394                 mode = "a"

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1900         from ..backends.api import to_netcdf
   1901
-> 1902         return to_netcdf(
   1903             self,
   1904             path,

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1070         # TODO: allow this work (setting up the file for writing array data)
   1071         # to be parallelized with dask
-> 1072         dump_to_store(
   1073             dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1074         )

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1117         variables, attrs = encoder(variables, attrs)
   1118
-> 1119     store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
   1120
   1121

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    259             writer = ArrayWriter()
    260
--> 261         variables, attributes = self.encode(variables, attributes)
    262
    263         self.set_attributes(attributes)

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/backends/common.py in encode(self, variables, attributes)
    348         # All NetCDF files get CF encoded by default, without this attempting
    349         # to write times, for example, would fail.
--> 350         variables, attributes = cf_encoder(variables, attributes)
    351         variables = {k: self.encode_variable(v) for k, v in variables.items()}
    352         attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/conventions.py in cf_encoder(variables, attributes)
    853     _update_bounds_encoding(variables)
    854
--> 855     new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
    856
    857     # Remove attrs from bounds variables (issue #2921)

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/conventions.py in <dictcomp>(.0)
    853     _update_bounds_encoding(variables)
    854
--> 855     new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
    856
    857     # Remove attrs from bounds variables (issue #2921)

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/conventions.py in encode_cf_variable(var, needs_copy, name)
    273     var = maybe_default_fill_value(var)
    274     var = maybe_encode_bools(var)
--> 275     var = ensure_dtype_not_object(var, name=name)
    276
    277     for attr_name in CF_RELATED_DATA:

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/conventions.py in ensure_dtype_not_object(var, name)
    231             data[missing] = fill_value
    232         else:
--> 233             data = _copy_with_dtype(data, dtype=_infer_dtype(data, name))
    234
    235         assert data.dtype.kind != "O" or data.dtype.metadata

/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/conventions.py in _infer_dtype(array, name)
    165         return dtype
    166
--> 167     raise ValueError(
    168         "unable to infer dtype on variable {!r}; xarray "
    169         "cannot serialize arbitrary Python objects".format(name)

ValueError: unable to infer dtype on variable 'ecc_prior'; xarray cannot serialize arbitrary Python objects

Expected behavior I expect the trace to be saved, as usual. I can individually save

trace.posterior.to_netcdf('posterior')
trace.log_likelihood.to_netcdf('log_likelihood')
trace.sample_stats.to_netcdf('sample_stats')

However, the same error is caused when trying to save trace.observed_data.to_netcdf('observed_data')

My setup

  • Version of exoplanet: 0.5.2
  • Operating system: I reproduced this error once on Linux and once on macOS 10.15
  • Python version: python 3.9.9
  • Installation method: pip install -U “exoplanet[extras]”
  • Version of arviz: 0.11.4
  • Version of pymc3: 3.11.4

Has anyone encountered this problem before, knows how to solve it, or has a suggestion for a workaround? Thank you very much in advance.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
dfmcommented, Apr 12, 2022

I have seen this before, and I don’t totally understand why this happens. In the short term, I’d recommend using the groups argument to to_netcdf (care of @avivajpeyi):

trace.to_netcdf(
    filename=...,
    groups=["posterior", "log_likelihood", "sample_stats"],
)

Here’s a simpler snippet that fails with the same issue:

import pymc3 as pm
import exoplanet as xo

with pm.Model() as model:
    ecc = pm.Uniform("ecc")
    xo.eccentricity.kipping13("ecc_prior", fixed=True, observed=ecc)
    trace = pm.sample(return_inferencedata=True)
    trace.to_netcdf("test")

This could be used to debug and find a longer term solution.

2reactions
ericagolcommented, Apr 13, 2022

If you’re interested in a Julia version, there has been some work on implementing at least some parts over in: https://github.com/JuliaAstro/Transits.jl

Also https://github.com/rodluger/Limbdark.jl

Read more comments on GitHub >

github_iconTop Results From Across the Web

Xarray Distributed Failed to serialize - dask - Stack Overflow
I tried this and didn't see any issue. · With small numbers, the problem doesn't pop up. · Can you provide reasonable numbers...
Read more >
Computation - Xarray
The labels associated with DataArray and Dataset objects enables some powerful shortcuts for computation, notably including aggregation and broadcasting by ...
Read more >
What's New - Xarray
Improve import time by lazy loading dask.distributed (:pull: 7172 ). ... Fixed “unhashable type” error trying to read NetCDF file with variable having...
Read more >
Parallel computing with Dask - Xarray
Note that writing netCDF files with Dask's distributed scheduler is only supported for the netcdf4 backend. A dataset can also be converted to...
Read more >
What's New — xarray 0.10.2 documentation
Raise an informative error message when using apply_ufunc with numpy v1.11 ... membership in the array data found in DataArray.values instead (GH1267).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found