Return a scalar instead of DataArray when the return value is a scalar
See original GitHub issueHi,
I’m not sure how devs will feel about this, but I wanted to ask because I’m getting into this issue frequently.
Currently many methods such as .min()
, .max()
, .mean()
returns a DataArray even for the cases where the return value is a scaler. For example,
import numpy as np
import xarray as xr
test = xr.DataArray(data=np.ones((10, 10)))
In [6]: test.min()
Out[6]:
<xarray.DataArray ()>
array(1.0)
which makes a lot of other things break down and I have to use test.min().values
or float(test.min())
.
I think it would be great that these methods return a scalar when the return value is a scaler. For example,
In [7]: np.ones((10, 10)).mean()
Out[7]: 1.0
Thank you!
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (6 by maintainers)
Top Results From Across the Web
xarray.DataArray.item
When the data type of a is longdouble or clongdouble, item() returns a scalar array object because there is no available Python scalar...
Read more >Making DataArray return a float by default instead of a 0 ...
Dataset) return another DataArray (resp. Dataset) object. In particular, operations returning scalar values (e.g. indexing or aggregations ...
Read more >xarray.DataArray.item — xarray 0.12.2 documentation
When the data type of a is longdouble or clongdouble, item() returns a scalar array object because there is no available Python scalar...
Read more >Scalars — NumPy v1.24 Manual
Scalars#. Python defines only one type of a particular data class (there is only one integer type, ... However, a void scalar rather...
Read more >vtkDataArray Class Reference - VTK
vtkDataArray vtkGenericDataArray < vtkPeriodicDataArray < Scalar > ... The range of the data array values will be returned in the provided range array ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@joonro, I think there’s a strong case to be made about returning a
DataArray
with some metadata appended. Referring to the latest draft of the CF Metadata Conventions, there is a clear way to indicate when operations such asmean
,max
, ormin
have been applied to a variable by using the cell_methods attribute.It might be more prudent to add this attribute whenever we apply these operations to a
DataArray
(or perhaps variable-wise when applied to aDataset
). That way, there is a clear reason to not return a scalar - the documentation of what operations were applied to produce that final result.I can whip up a working example/pull request if people think this is a direction to go. I’d probably build a decorator which handles inspection of the operator name and arguments and uses that to add the cell_methods attribute, that way people can add the same functionality to homegrown methods/operators.
This is a bad path to go down 😃. Now your code might suddenly break when you add a metadata field!
In principle, we could pick some subset of operations for which to always do this and others for which to never do this (e.g., aggregating out all dimensions, but not indexing out all dimensions), but I think this inconsistency would be even more surprising. It’s pretty easy to see how this could lead to bugs, too. At least now you know you always need to type
.values
or.item()
!