question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

apply_ufunc should preemptively broadcast

See original GitHub issue

Code Sample

I am having some troubles understanding apply_ufunc broadcasting rules. As I had some trouble understanding the docs, I am not 100% sure it is a bug, but I am quite sure. I will try to explain why with the following really simple example.

import xarray as xr
import numpy as np

a = xr.DataArray(data=np.random.normal(size=(7, 3)), dims=["dim1", "dim2"])
c = xr.DataArray(data=np.random.normal(size=(5, 6)), dims=["dim3", "dim4"])

def func(x,y):
    print(x.shape)
    print(y.shape)
    return

The function defined always raises an error when trying to call apply_ufunc, but this is intended, as the shapes have already been printed by then, and this keeps the example as simple as possible.

Problem description

xr.apply_ufunc(func, a, c)
# Out
# (7, 3, 1, 1)
# (5, 6)

Here, a has been kind of broadcasted, but I would expect the shapes of a and c to be the same as when calling xr.broadcast, as there are no input core dims, so all dimensions are broadcasted. However:

print([ary.shape for ary in xr.broadcast(a,c)])
# [(7, 3, 5, 6), (7, 3, 5, 6)]

Using different input core dims does not get rid of the problem, instead I believe it shows some more issues:

xr.apply_ufunc(func, a, c, input_core_dims=[["dim1"],[]])
# (3, 1, 1, 7), expected (3, 5, 6, 7)
# (5, 6), expected (3, 5, 6)

xr.apply_ufunc(func, a, c, input_core_dims=[[],["dim3"]])
# (7, 3, 1), expected (7, 3, 6)
# (6, 5), expected (7, 3, 6, 5)

xr.apply_ufunc(func, a, c, input_core_dims=[["dim1"],["dim3"]])
# (3, 1, 7), expected (3, 6, 7)
# (6, 5), expected (3, 6, 5)

Is this current behaviour what should be expected?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 (default, Jan 14 2019, 11:02:34) [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] python-bits: 64 OS: Linux OS-release: 4.15.0-52-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.3

xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.1.0 cartopy: None seaborn: None setuptools: 41.0.0 pip: 19.1.1 conda: None pytest: 4.5.0 IPython: 7.5.0 sphinx: 2.0.1

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
shoyercommented, Jun 19, 2019

Yes, exactly.

2reactions
shoyercommented, Jun 19, 2019

With NumPy arrays at least, there is no cost for broadcasting, because it can always be done with views. But even for other array types, inserting size 1 dimensions in the correct location should be basically free, and would be more helpful than what we currently do

On Wed, Jun 19, 2019 at 9:25 PM Oriol Abril notifications@github.com wrote:

I’m trying to think whether there would be any performance cost there - i.e. are there any arrays where preemptive broadcasting would be both expensive and unnecessary?

Even if there were a performance cost (compared to the actual behaviour), it could be easily avoided by using all dims as input_core_dims couldn’t it? IIUC, all dims should be broadcasted unless they are in input core dims, so it broadcasting could still be avoided without problem.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3032?email_source=notifications&email_token=AAJJFVUDJUKKVSXB5DQONF3P3J2YXA5CNFSM4HZEHCZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCXX4Y#issuecomment-503675891, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJJFVTEQBAYTYROYMQNGC3P3J2YXANCNFSM4HZEHCZA .

Read more comments on GitHub >

github_iconTop Results From Across the Web

xarray.apply_ufunc — xarray 0.10.2 documentation
Core dimensions on the inputs to exclude from alignment and broadcasting entirely. Any input coordinates along these dimensions will be dropped. Each excluded ......
Read more >
How to use apply_ufunc with numpy.digitize for each image ...
I have of an xarray.DataArray of shape x, y, and time. I've trying to puzzle out what values I should supply to the...
Read more >
How to use the xarray.apply_ufunc function in xarray - Snyk
To help you get started, we've selected a few xarray.apply_ufunc examples, based on popular ways it is used in public projects.
Read more >
Applying unvectorized functions with apply_ufunc
Apply a vectorized function for unlabeled arrays on xarray objects. The function will be mapped over the data variable(s) of the input arguments...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found