numpy function very slow on DataArray compared to DataArray.values
See original GitHub issueFirst I create some fake latitude and longitude points. I stash them in a dataset, and compute a 2d histogram on those.
#!/usr/bin/env python
import xarray as xr
import numpy as np
lat = np.random.rand(50000) * 180 - 90
lon = np.random.rand(50000) * 360 - 180
d = xr.Dataset({'latitude':lat, 'longitude':lon})
latbins = np.r_[-90:90:2.]
lonbins = np.r_[-180:180:2.]
h, xx, yy = np.histogram2d(d['longitude'], d['latitude'], bins=(lonbins, latbins))
When I run this I get some underwhelming performance:
> time ./test_with_xarray.py
real 0m28.152s
user 0m27.201s
sys 0m0.630s
If I change the last line to
h, xx, yy = np.histogram2d(d['longitude'].values, d['latitude'].values, bins=(lonbins, latbins))
(i.e. I pass the numpy arrays directly to the histogram2d function), things are very different:
> time ./test_with_xarray.py
real 0m0.996s
user 0m0.569s
sys 0m0.253s
It’s ~28 times slower to call histogram2d on the DataArrays, compared to calling it on the underlying numpy arrays. I ran into this issue while histogramming quite large lon/lat vectors from multiple netCDF files. I got tired waiting for the computation to end, added the .values
to the call and went through very quickly.
It seems problematic that using xarray can slow down your code by 28 times with no real way for you to know about it…
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Converting xarray dataarray to numpy ndarray too slow
I have a newbie question about xarray. I am finding that converting an xarray dataarray to a numpy ndarray is painfully slow. I...
Read more >Converting from xarray dataarray to numpy ndarray is painfully ...
It makes no difference as converting the newly created array to a numpy array using 'values' is still painfully slow.
Read more >Struggling with large dataset loading/reading using xarray
Hi all! I am a physical oceanographer and new in python and I recently watched a tutorial about minimizing the processing time of ......
Read more >the absolute basics for beginners — NumPy v1.25.dev0 Manual
NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much...
Read more >Combining data - Xarray
In [15]: xr.merge([ds, ds + 1]) MergeError: conflicting values for variable ... The same non-destructive merging between DataArray index coordinates is used ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There’s an underlying NumPy issue here for why this is slow (https://github.com/numpy/numpy/issues/8562).
For
histogram2d
in particular, this could be fixed in NumPy by callingnp.asarray
on each ofx
andy
before passing them tohistogramdd
.Closing as upstream issue.