Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Calling Dataset.mean() drops coordinates

See original GitHub issue

This is a similar issue to bug #1470.

MCVE Code Sample

import xarray as xr
import numpy as np

x = np.linspace(0,1,5)
y = np.linspace(-1,0,5)
t = np.linspace(0,10,10)

dataArray1 = xr.DataArray(np.random.random((5,5,10)),
                            dims=('x','y','t'),
                            coords={'x':x,'y':y,'t':t})

dataArray2 = xr.DataArray(np.random.random((5,5,10)),
                            dims=('x','y','t'),
                            coords={'x':x,'y':y,'t':t})

dataset = xr.Dataset({'a':dataArray1,'b':dataArray2})

datasetWithCoords = xr.Dataset({'a':dataArray1,'b':dataArray2},coords={'x':x,'y':y,'t':t})

print("datarray1:")
print(dataArray1)

print("dataArray1 after mean")
print(dataArray1.mean(axis=0))

print("dataset:")
print(dataset)

print("dataset after mean")
print(dataset.mean(axis=0))

print("dataset with coords:")
print(datasetWithCoords)

print("dataset with coords after mean")
print(datasetWithCoords.mean(axis=0))

Output (with extra stuff snipped for brevity):

datarray1:
<xarray.DataArray (x: 5, y: 5, t: 10)>
<array printout>
Coordinates:
  * x        (x) float64 0.0 0.25 0.5 0.75 1.0
  * y        (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
  * t        (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
dataArray1 after mean
<xarray.DataArray (y: 5, t: 10)>
<array printout>
Coordinates:
  * y        (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
  * t        (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
### Note that coordinates are kept after the mean operation when performed just on an array

dataset:
<xarray.Dataset>
Dimensions:  (t: 10, x: 5, y: 5)
Coordinates:
  * x        (x) float64 0.0 0.25 0.5 0.75 1.0
  * y        (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
  * t        (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
Data variables:
    a        (x, y, t) float64 0.1664 0.8147 0.5346 ... 0.2241 0.9872 0.9351
    b        (x, y, t) float64 0.6135 0.2305 0.8146 ... 0.6323 0.5638 0.9762
dataset after mean
<xarray.Dataset>
Dimensions:  (t: 10, y: 5)
Dimensions without coordinates: t, y
Data variables:
    a        (y, t) float64 0.2006 0.6135 0.6345 0.2415 ... 0.3047 0.4983 0.4734
    b        (y, t) float64 0.3459 0.4361 0.7502 0.508 ... 0.6943 0.4702 0.4284
dataset with coords:
<xarray.Dataset>
Dimensions:  (t: 10, x: 5, y: 5)
Coordinates:
  * x        (x) float64 0.0 0.25 0.5 0.75 1.0
  * y        (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
  * t        (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
Data variables:
    a        (x, y, t) float64 0.1664 0.8147 0.5346 ... 0.2241 0.9872 0.9351
    b        (x, y, t) float64 0.6135 0.2305 0.8146 ... 0.6323 0.5638 0.9762
dataset with coords after mean
<xarray.Dataset>
Dimensions:  (t: 10, y: 5)
Dimensions without coordinates: t, y
Data variables:
    a        (y, t) float64 0.2006 0.6135 0.6345 0.2415 ... 0.3047 0.4983 0.4734
    b        (y, t) float64 0.3459 0.4361 0.7502 0.508 ... 0.6943 0.4702 0.4284

It’s also worth mentioning that the data arrays contained in the dataset also loose their coordinates during this operation. I.E:

>>> print(dataset.mean(axis=0)['a'])
<xarray.DataArray 'a' (y: 5, t: 10)>
array([[0.4974686 , 0.44360968, 0.62252578, 0.56453058, 0.45996295,
        0.51323367, 0.54304355, 0.64448021, 0.50438884, 0.37762424],
       [0.43043363, 0.47008095, 0.23738985, 0.58194424, 0.50207939,
        0.45236528, 0.45457519, 0.67353014, 0.54388373, 0.52579016],
       [0.42944067, 0.51871646, 0.28812999, 0.53518657, 0.57115733,
        0.62391936, 0.40276949, 0.2385865 , 0.6050159 , 0.56724394],
       [0.43676851, 0.43539912, 0.30910443, 0.45708179, 0.44772562,
        0.58081722, 0.3608285 , 0.69107338, 0.37702932, 0.34231931],
       [0.56137156, 0.62710607, 0.77171961, 0.58043904, 0.80014925,
        0.67720374, 0.73277691, 0.85934107, 0.53542093, 0.3573311 ]])
Dimensions without coordinates: y, t

Output of `xr.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 11, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None libhdf5: 1.10.5 libnetcdf: 4.7.2

xarray: 0.14.0 pandas: 0.25.2 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.7.0 distributed: None matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 40.8.0 pip: 19.3 conda: None pytest: None IPython: 7.8.0 sphinx: None None

Issue Analytics

State:
Created 4 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

2reactions

dcheriancommented, Apr 9, 2020

I agree with removing. We could raise an error saying please use 'dim' instead

0reactions

TomNicholascommented, Apr 9, 2020

Can someone explain why we even allow axis as a valid argument? I can’t think of any good reasons why we should keep it, and it’s inconsistent with the API in most places I think.