question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Changing dtype on v0.13.0 causes Dataset attributes to be lost

See original GitHub issue

MCVE Code Sample

import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(123)

times = pd.date_range("2000-01-01", "2001-12-31", name="time")
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))

base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)

ds = xr.Dataset({"tmin": (("time", "location"), tmin_values),
                 "tmax": (("time", "location"), tmax_values),},
                {"time": times, "location": ["IA", "IN", "IL"]})

# Assign an attribute
ds = ds.assign_attrs(CRS = 'EPSG:4326')

# Change dtype
ds.astype(np.float32)

Expected Output

ds to be returned with variables of dtype np.float32, with attributes (e.g. CRS = 'EPSG:4326') still included in the dataset.

Problem Description

On xarray version 0.12.1, changing the dtype of a dataset preserves any attached attributes, e.g:

<xarray.Dataset>
Dimensions:   (location: 3, time: 731)
Coordinates:
  * location  (location) <U2 'IA' 'IN' 'IL'
  * time      (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
Data variables:
    tmin      (time, location) float32 -8.03737 -1.7884412 ... -4.543927
    tmax      (time, location) float32 12.980549 3.3104093 ... 3.8052793
Attributes:
    CRS:      EPSG:4326

However, on xarray version 0.13.0, changing the dtype of a dataset silently drops any attached attributes, e.g:

<xarray.Dataset>
Dimensions:   (location: 3, time: 731)
Coordinates:
  * time      (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
  * location  (location) <U2 'IA' 'IN' 'IL'
Data variables:
    tmin      (time, location) float32 -8.03737 -1.7884412 ... -4.543927
    tmax      (time, location) float32 12.980549 3.3104093 ... 3.8052793

This causes issues with large geospatial analyses (e.g. OpenDataCube workflows), as we need to change dtype to reduce memory, but also preserve CRS information that is used for downstream tools.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 (default, Jan 14 2019, 11:02:34) [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] python-bits: 64 OS: Linux OS-release: 4.14.133-113.112.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: None LOCALE: en_US.UTF-8 libhdf5: 1.10.0 libnetcdf: 4.6.0

xarray: 0.13.0 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.3.1 netCDF4: 1.3.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.24 cfgrib: None iris: None bottleneck: None dask: 2.3.0 distributed: 2.3.2 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 40.6.3 pip: 19.2.3 conda: None pytest: 3.5.0 IPython: 7.8.0 sphinx: None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
shoyercommented, Sep 27, 2019

It’s also not clear to me how this managed to change/break but I think we should consider it a bug.

I agree that the right fix is to write an explicit astype method, but I don’t think there’s a good reason for keep_attrs=False in this case. astype isn’t really changing the data, just changing how the data is represented so I think it makes sense to always preserve attributes.

0reactions
mathausecommented, Dec 24, 2020

Yes, this is fixed so I am closing. Thanks for noting @rhkleijn

Read more comments on GitHub >

github_iconTop Results From Across the Web

What's New — xarray 0.10.2 documentation
Dataset objects with recent versions of NumPy (v1.13 and newer): ... Fix test suite failure caused by changes to pandas.cut function (GH1386).
Read more >
Working with missing data — pandas 1.5.2 documentation
You can insert missing values by simply assigning to containers. The actual missing value used will be chosen based on the dtype. For...
Read more >
Changing a field data type - Amazon QuickSight
Changing a field's data type in an analysis changes it for all visuals in the analysis that use that dataset. However, it doesn't...
Read more >
Modify or change the data type setting for a field
Convert to this type From this type Changes or restrictions Text Memo Access deletes all but the first 255 characters. Number No restrictions. Currency No restrictions....
Read more >
Home · HDF5.jl
However, to preserve Julia objects, one generally needs additional type ... This file will have no elements (groups, datasets, or attributes) that are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found