apply_ufunc should preemptively broadcast
See original GitHub issueCode Sample
I am having some troubles understanding apply_ufunc
broadcasting rules. As I had some trouble understanding the docs, I am not 100% sure it is a bug, but I am quite sure. I will try to explain why with the following really simple example.
import xarray as xr
import numpy as np
a = xr.DataArray(data=np.random.normal(size=(7, 3)), dims=["dim1", "dim2"])
c = xr.DataArray(data=np.random.normal(size=(5, 6)), dims=["dim3", "dim4"])
def func(x,y):
print(x.shape)
print(y.shape)
return
The function defined always raises an error when trying to call apply_ufunc
, but this is intended, as the shapes have already been printed by then, and this keeps the example as simple as possible.
Problem description
xr.apply_ufunc(func, a, c)
# Out
# (7, 3, 1, 1)
# (5, 6)
Here, a
has been kind of broadcasted, but I would expect the shapes of a
and c
to be the same as when calling xr.broadcast
, as there are no input core dims, so all dimensions are broadcasted. However:
print([ary.shape for ary in xr.broadcast(a,c)])
# [(7, 3, 5, 6), (7, 3, 5, 6)]
Using different input core dims does not get rid of the problem, instead I believe it shows some more issues:
xr.apply_ufunc(func, a, c, input_core_dims=[["dim1"],[]])
# (3, 1, 1, 7), expected (3, 5, 6, 7)
# (5, 6), expected (3, 5, 6)
xr.apply_ufunc(func, a, c, input_core_dims=[[],["dim3"]])
# (7, 3, 1), expected (7, 3, 6)
# (6, 5), expected (7, 3, 6, 5)
xr.apply_ufunc(func, a, c, input_core_dims=[["dim1"],["dim3"]])
# (3, 1, 7), expected (3, 6, 7)
# (6, 5), expected (3, 6, 5)
Is this current behaviour what should be expected?
Output of xr.show_versions()
xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.1.0 cartopy: None seaborn: None setuptools: 41.0.0 pip: 19.1.1 conda: None pytest: 4.5.0 IPython: 7.5.0 sphinx: 2.0.1
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
Yes, exactly.
With NumPy arrays at least, there is no cost for broadcasting, because it can always be done with views. But even for other array types, inserting size 1 dimensions in the correct location should be basically free, and would be more helpful than what we currently do
On Wed, Jun 19, 2019 at 9:25 PM Oriol Abril notifications@github.com wrote: