Calling Dataset.mean() drops coordinates
See original GitHub issueThis is a similar issue to bug #1470.
MCVE Code Sample
import xarray as xr
import numpy as np
x = np.linspace(0,1,5)
y = np.linspace(-1,0,5)
t = np.linspace(0,10,10)
dataArray1 = xr.DataArray(np.random.random((5,5,10)),
dims=('x','y','t'),
coords={'x':x,'y':y,'t':t})
dataArray2 = xr.DataArray(np.random.random((5,5,10)),
dims=('x','y','t'),
coords={'x':x,'y':y,'t':t})
dataset = xr.Dataset({'a':dataArray1,'b':dataArray2})
datasetWithCoords = xr.Dataset({'a':dataArray1,'b':dataArray2},coords={'x':x,'y':y,'t':t})
print("datarray1:")
print(dataArray1)
print("dataArray1 after mean")
print(dataArray1.mean(axis=0))
print("dataset:")
print(dataset)
print("dataset after mean")
print(dataset.mean(axis=0))
print("dataset with coords:")
print(datasetWithCoords)
print("dataset with coords after mean")
print(datasetWithCoords.mean(axis=0))
Output (with extra stuff snipped for brevity):
datarray1:
<xarray.DataArray (x: 5, y: 5, t: 10)>
<array printout>
Coordinates:
* x (x) float64 0.0 0.25 0.5 0.75 1.0
* y (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
* t (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
dataArray1 after mean
<xarray.DataArray (y: 5, t: 10)>
<array printout>
Coordinates:
* y (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
* t (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
### Note that coordinates are kept after the mean operation when performed just on an array
dataset:
<xarray.Dataset>
Dimensions: (t: 10, x: 5, y: 5)
Coordinates:
* x (x) float64 0.0 0.25 0.5 0.75 1.0
* y (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
* t (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
Data variables:
a (x, y, t) float64 0.1664 0.8147 0.5346 ... 0.2241 0.9872 0.9351
b (x, y, t) float64 0.6135 0.2305 0.8146 ... 0.6323 0.5638 0.9762
dataset after mean
<xarray.Dataset>
Dimensions: (t: 10, y: 5)
Dimensions without coordinates: t, y
Data variables:
a (y, t) float64 0.2006 0.6135 0.6345 0.2415 ... 0.3047 0.4983 0.4734
b (y, t) float64 0.3459 0.4361 0.7502 0.508 ... 0.6943 0.4702 0.4284
dataset with coords:
<xarray.Dataset>
Dimensions: (t: 10, x: 5, y: 5)
Coordinates:
* x (x) float64 0.0 0.25 0.5 0.75 1.0
* y (y) float64 -1.0 -0.75 -0.5 -0.25 0.0
* t (t) float64 0.0 1.111 2.222 3.333 4.444 ... 6.667 7.778 8.889 10.0
Data variables:
a (x, y, t) float64 0.1664 0.8147 0.5346 ... 0.2241 0.9872 0.9351
b (x, y, t) float64 0.6135 0.2305 0.8146 ... 0.6323 0.5638 0.9762
dataset with coords after mean
<xarray.Dataset>
Dimensions: (t: 10, y: 5)
Dimensions without coordinates: t, y
Data variables:
a (y, t) float64 0.2006 0.6135 0.6345 0.2415 ... 0.3047 0.4983 0.4734
b (y, t) float64 0.3459 0.4361 0.7502 0.508 ... 0.6943 0.4702 0.4284
It’s also worth mentioning that the data arrays contained in the dataset also loose their coordinates during this operation. I.E:
>>> print(dataset.mean(axis=0)['a'])
<xarray.DataArray 'a' (y: 5, t: 10)>
array([[0.4974686 , 0.44360968, 0.62252578, 0.56453058, 0.45996295,
0.51323367, 0.54304355, 0.64448021, 0.50438884, 0.37762424],
[0.43043363, 0.47008095, 0.23738985, 0.58194424, 0.50207939,
0.45236528, 0.45457519, 0.67353014, 0.54388373, 0.52579016],
[0.42944067, 0.51871646, 0.28812999, 0.53518657, 0.57115733,
0.62391936, 0.40276949, 0.2385865 , 0.6050159 , 0.56724394],
[0.43676851, 0.43539912, 0.30910443, 0.45708179, 0.44772562,
0.58081722, 0.3608285 , 0.69107338, 0.37702932, 0.34231931],
[0.56137156, 0.62710607, 0.77171961, 0.58043904, 0.80014925,
0.67720374, 0.73277691, 0.85934107, 0.53542093, 0.3573311 ]])
Dimensions without coordinates: y, t
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 11, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
libhdf5: 1.10.5
libnetcdf: 4.7.2
xarray: 0.14.0 pandas: 0.25.2 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.7.0 distributed: None matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 40.8.0 pip: 19.3 conda: None pytest: None IPython: 7.8.0 sphinx: None None
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Indexing and selecting data - Xarray
As xarray objects can store coordinates corresponding to each dimension of an array, ... Use drop_dims() to drop a full dimension from a...
Read more >Handling NetCDF Files using XArray for Absolute Beginners
Extracting DataArrays from DataSet DS is very straightforward, as DS.<var_name> will suffice. You might consider dropping NaN entries by dropna ...
Read more >mne.io.Raw — MNE 1.2.2 documentation - MNE-Python
cart_to_eeglab() . For EDF exports, only channels measured in Volts are allowed; in MNE-Python this means channel types 'eeg', 'ecog', ' ...
Read more >K-means Clustering in Python: A Step-by-Step Guide
The expression above can be extended to more than 2 attributes and the distance can be measured between any two points. For a...
Read more >Frequently Asked Questions about data.table
2.19 Why does [.data.table now have a drop argument from v1.5? ... N rather than calling length() on any column.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I agree with removing. We could raise an error saying
please use 'dim' instead
Can someone explain why we even allow axis as a valid argument? I can’t think of any good reasons why we should keep it, and it’s inconsistent with the API in most places I think.