question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Calling Dataset.mean() drops coordinates

See original GitHub issue

This is a similar issue to bug #1470.

MCVE Code Sample

import xarray as xr
import numpy as np

x = np.linspace(0,1,5)
y = np.linspace(-1,0,5)
t = np.linspace(0,10,10)

dataArray1 = xr.DataArray(np.random.random((5,5,10)),
                            dims=('x','y','t'),
                            coords={'x':x,'y':y,'t':t})

dataArray2 = xr.DataArray(np.random.random((5,5,10)),
                            dims=('x','y','t'),
                            coords={'x':x,'y':y,'t':t})

dataset = xr.Dataset({'a':dataArray1,'b':dataArray2})

datasetWithCoords = xr.Dataset({'a':dataArray1,'b':dataArray2},coords={'x':x,'y':y,'t':t})

print("datarray1:")
print(dataArray1)

print("dataArray1 after mean")
print(dataArray1.mean(axis=0))

print("dataset:")
print(dataset)

print("dataset after mean")
print(dataset.mean(axis=0))

print("dataset with coords:")
print(datasetWithCoords)

print("dataset with coords after mean")
print(datasetWithCoords.mean(axis=0))


Output (with extra stuff snipped for brevity):

datarray1:
<xarray.DataArray (x: 5, y: 5, t: 10)>
<array printout>
Coordinates:
  * x        (x) float64 0.0 0.25 0.5 0.75 1.0
  * y        (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
  * t        (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
dataArray1 after mean
<xarray.DataArray (y: 5, t: 10)>
<array printout>
Coordinates:
  * y        (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
  * t        (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
### Note that coordinates are kept after the mean operation when performed just on an array

dataset:
<xarray.Dataset>
Dimensions:  (t: 10, x: 5, y: 5)
Coordinates:
  * x        (x) float64 0.0 0.25 0.5 0.75 1.0
  * y        (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
  * t        (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
Data variables:
    a        (x, y, t) float64 0.1664 0.8147 0.5346 ... 0.2241 0.9872 0.9351
    b        (x, y, t) float64 0.6135 0.2305 0.8146 ... 0.6323 0.5638 0.9762
dataset after mean
<xarray.Dataset>
Dimensions:  (t: 10, y: 5)
Dimensions without coordinates: t, y
Data variables:
    a        (y, t) float64 0.2006 0.6135 0.6345 0.2415 ... 0.3047 0.4983 0.4734
    b        (y, t) float64 0.3459 0.4361 0.7502 0.508 ... 0.6943 0.4702 0.4284
dataset with coords:
<xarray.Dataset>
Dimensions:  (t: 10, x: 5, y: 5)
Coordinates:
  * x        (x) float64 0.0 0.25 0.5 0.75 1.0
  * y        (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
  * t        (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
Data variables:
    a        (x, y, t) float64 0.1664 0.8147 0.5346 ... 0.2241 0.9872 0.9351
    b        (x, y, t) float64 0.6135 0.2305 0.8146 ... 0.6323 0.5638 0.9762
dataset with coords after mean
<xarray.Dataset>
Dimensions:  (t: 10, y: 5)
Dimensions without coordinates: t, y
Data variables:
    a        (y, t) float64 0.2006 0.6135 0.6345 0.2415 ... 0.3047 0.4983 0.4734
    b        (y, t) float64 0.3459 0.4361 0.7502 0.508 ... 0.6943 0.4702 0.4284

It’s also worth mentioning that the data arrays contained in the dataset also loose their coordinates during this operation. I.E:

>>> print(dataset.mean(axis=0)['a'])
<xarray.DataArray 'a' (y: 5, t: 10)>
array([[0.4974686 , 0.44360968, 0.62252578, 0.56453058, 0.45996295,
        0.51323367, 0.54304355, 0.64448021, 0.50438884, 0.37762424],
       [0.43043363, 0.47008095, 0.23738985, 0.58194424, 0.50207939,
        0.45236528, 0.45457519, 0.67353014, 0.54388373, 0.52579016],
       [0.42944067, 0.51871646, 0.28812999, 0.53518657, 0.57115733,
        0.62391936, 0.40276949, 0.2385865 , 0.6050159 , 0.56724394],
       [0.43676851, 0.43539912, 0.30910443, 0.45708179, 0.44772562,
        0.58081722, 0.3608285 , 0.69107338, 0.37702932, 0.34231931],
       [0.56137156, 0.62710607, 0.77171961, 0.58043904, 0.80014925,
        0.67720374, 0.73277691, 0.85934107, 0.53542093, 0.3573311 ]])
Dimensions without coordinates: y, t

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 11, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None libhdf5: 1.10.5 libnetcdf: 4.7.2

xarray: 0.14.0 pandas: 0.25.2 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.7.0 distributed: None matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 40.8.0 pip: 19.3 conda: None pytest: None IPython: 7.8.0 sphinx: None None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
dcheriancommented, Apr 9, 2020

I agree with removing. We could raise an error saying please use 'dim' instead

0reactions
TomNicholascommented, Apr 9, 2020

Can someone explain why we even allow axis as a valid argument? I can’t think of any good reasons why we should keep it, and it’s inconsistent with the API in most places I think.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Indexing and selecting data - Xarray
As xarray objects can store coordinates corresponding to each dimension of an array, ... Use drop_dims() to drop a full dimension from a...
Read more >
Handling NetCDF Files using XArray for Absolute Beginners
Extracting DataArrays from DataSet DS is very straightforward, as DS.<var_name> will suffice. You might consider dropping NaN entries by dropna ...
Read more >
mne.io.Raw — MNE 1.2.2 documentation - MNE-Python
cart_to_eeglab() . For EDF exports, only channels measured in Volts are allowed; in MNE-Python this means channel types 'eeg', 'ecog', ' ...
Read more >
K-means Clustering in Python: A Step-by-Step Guide
The expression above can be extended to more than 2 attributes and the distance can be measured between any two points. For a...
Read more >
Frequently Asked Questions about data.table
2.19 Why does [.data.table now have a drop argument from v1.5? ... N rather than calling length() on any column.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found