xarray serialization error when using observed custom distribution
See original GitHub issueDescribe the bug Unable to save traces to file, which is essential for running code on a cluster.
To Reproduce
Follow the case study Fitting TESS data (https://gallery.exoplanet.codes/tutorials/tess/) –
except using
lc = lc_file.remove_nans().remove_outliers().normalize()
instead of
lc = lc_file.remove_nans().normalize().remove_outliers()
, as the first order of transformations raised an unrelated error in my case.
After sampling, try to save the trace as one commonly saves arviz.InferenceData objects: trace.to_netcdf('results')
.
This will raise the following error
ValueError Traceback (most recent call last)
<ipython-input-23-c0e5828e59ee> in <module>
----> 1 trace.to_netcdf('results')
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/arviz/data/inference_data.py in to_netcdf(self, filename, compress, groups)
390 if compress:
391 kwargs["encoding"] = {var_name: {"zlib": True} for var_name in data.variables}
--> 392 data.to_netcdf(filename, mode=mode, group=group, **kwargs)
393 data.close()
394 mode = "a"
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
1900 from ..backends.api import to_netcdf
1901
-> 1902 return to_netcdf(
1903 self,
1904 path,
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1070 # TODO: allow this work (setting up the file for writing array data)
1071 # to be parallelized with dask
-> 1072 dump_to_store(
1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
1074 )
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
1117 variables, attrs = encoder(variables, attrs)
1118
-> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
1120
1121
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
259 writer = ArrayWriter()
260
--> 261 variables, attributes = self.encode(variables, attributes)
262
263 self.set_attributes(attributes)
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/backends/common.py in encode(self, variables, attributes)
348 # All NetCDF files get CF encoded by default, without this attempting
349 # to write times, for example, would fail.
--> 350 variables, attributes = cf_encoder(variables, attributes)
351 variables = {k: self.encode_variable(v) for k, v in variables.items()}
352 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/conventions.py in cf_encoder(variables, attributes)
853 _update_bounds_encoding(variables)
854
--> 855 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
856
857 # Remove attrs from bounds variables (issue #2921)
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/conventions.py in <dictcomp>(.0)
853 _update_bounds_encoding(variables)
854
--> 855 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
856
857 # Remove attrs from bounds variables (issue #2921)
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/conventions.py in encode_cf_variable(var, needs_copy, name)
273 var = maybe_default_fill_value(var)
274 var = maybe_encode_bools(var)
--> 275 var = ensure_dtype_not_object(var, name=name)
276
277 for attr_name in CF_RELATED_DATA:
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/conventions.py in ensure_dtype_not_object(var, name)
231 data[missing] = fill_value
232 else:
--> 233 data = _copy_with_dtype(data, dtype=_infer_dtype(data, name))
234
235 assert data.dtype.kind != "O" or data.dtype.metadata
/cluster/apps/nss/gcc-8.2.0/python/3.9.9/x86_64/lib64/python3.9/site-packages/xarray/conventions.py in _infer_dtype(array, name)
165 return dtype
166
--> 167 raise ValueError(
168 "unable to infer dtype on variable {!r}; xarray "
169 "cannot serialize arbitrary Python objects".format(name)
ValueError: unable to infer dtype on variable 'ecc_prior'; xarray cannot serialize arbitrary Python objects
Expected behavior I expect the trace to be saved, as usual. I can individually save
trace.posterior.to_netcdf('posterior')
trace.log_likelihood.to_netcdf('log_likelihood')
trace.sample_stats.to_netcdf('sample_stats')
However, the same error is caused when trying to save
trace.observed_data.to_netcdf('observed_data')
My setup
- Version of exoplanet: 0.5.2
- Operating system: I reproduced this error once on Linux and once on macOS 10.15
- Python version: python 3.9.9
- Installation method: pip install -U “exoplanet[extras]”
- Version of arviz: 0.11.4
- Version of pymc3: 3.11.4
Has anyone encountered this problem before, knows how to solve it, or has a suggestion for a workaround? Thank you very much in advance.
Issue Analytics
- State:
- Created a year ago
- Comments:11 (5 by maintainers)
I have seen this before, and I don’t totally understand why this happens. In the short term, I’d recommend using the
groups
argument toto_netcdf
(care of @avivajpeyi):Here’s a simpler snippet that fails with the same issue:
This could be used to debug and find a longer term solution.
Also https://github.com/rodluger/Limbdark.jl