Changing dtype on v0.13.0 causes Dataset attributes to be lost
See original GitHub issueMCVE Code Sample
import numpy as np
import pandas as pd
import xarray as xr
np.random.seed(123)
times = pd.date_range("2000-01-01", "2001-12-31", name="time")
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))
base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)
ds = xr.Dataset({"tmin": (("time", "location"), tmin_values),
"tmax": (("time", "location"), tmax_values),},
{"time": times, "location": ["IA", "IN", "IL"]})
# Assign an attribute
ds = ds.assign_attrs(CRS = 'EPSG:4326')
# Change dtype
ds.astype(np.float32)
Expected Output
ds
to be returned with variables of dtype
np.float32
, with attributes (e.g. CRS = 'EPSG:4326'
) still included in the dataset.
Problem Description
On xarray
version 0.12.1, changing the dtype
of a dataset preserves any attached attributes, e.g:
<xarray.Dataset>
Dimensions: (location: 3, time: 731)
Coordinates:
* location (location) <U2 'IA' 'IN' 'IL'
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
Data variables:
tmin (time, location) float32 -8.03737 -1.7884412 ... -4.543927
tmax (time, location) float32 12.980549 3.3104093 ... 3.8052793
Attributes:
CRS: EPSG:4326
However, on xarray
version 0.13.0, changing the dtype
of a dataset silently drops any attached attributes, e.g:
<xarray.Dataset>
Dimensions: (location: 3, time: 731)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
* location (location) <U2 'IA' 'IN' 'IL'
Data variables:
tmin (time, location) float32 -8.03737 -1.7884412 ... -4.543927
tmax (time, location) float32 12.980549 3.3104093 ... 3.8052793
This causes issues with large geospatial analyses (e.g. OpenDataCube workflows), as we need to change dtype to reduce memory, but also preserve CRS information that is used for downstream tools.
Output of xr.show_versions()
xarray: 0.13.0 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.3.1 netCDF4: 1.3.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.24 cfgrib: None iris: None bottleneck: None dask: 2.3.0 distributed: 2.3.2 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 40.6.3 pip: 19.2.3 conda: None pytest: 3.5.0 IPython: 7.8.0 sphinx: None
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:7 (4 by maintainers)
Top GitHub Comments
It’s also not clear to me how this managed to change/break but I think we should consider it a bug.
I agree that the right fix is to write an explicit
astype
method, but I don’t think there’s a good reason forkeep_attrs=False
in this case.astype
isn’t really changing the data, just changing how the data is represented so I think it makes sense to always preserve attributes.Yes, this is fixed so I am closing. Thanks for noting @rhkleijn