Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Return a scalar instead of DataArray when the return value is a scalar

See original GitHub issue

Hi,

I’m not sure how devs will feel about this, but I wanted to ask because I’m getting into this issue frequently.

Currently many methods such as .min(), .max(), .mean() returns a DataArray even for the cases where the return value is a scaler. For example,

import numpy as np
import xarray as xr
test = xr.DataArray(data=np.ones((10, 10)))

In [6]: test.min()
Out[6]: 
<xarray.DataArray ()>
array(1.0)

which makes a lot of other things break down and I have to use test.min().values or float(test.min()). I think it would be great that these methods return a scalar when the return value is a scaler. For example,

In [7]: np.ones((10, 10)).mean()
Out[7]: 1.0

Thank you!

Issue Analytics

State:
Created 7 years ago
Comments:11 (6 by maintainers)

Top GitHub Comments

2reactions

darothencommented, Aug 27, 2016

@joonro, I think there’s a strong case to be made about returning a DataArray with some metadata appended. Referring to the latest draft of the CF Metadata Conventions, there is a clear way to indicate when operations such as mean, max, or min have been applied to a variable by using the cell_methods attribute.

It might be more prudent to add this attribute whenever we apply these operations to a DataArray (or perhaps variable-wise when applied to a Dataset). That way, there is a clear reason to not return a scalar - the documentation of what operations were applied to produce that final result.

I can whip up a working example/pull request if people think this is a direction to go. I’d probably build a decorator which handles inspection of the operator name and arguments and uses that to add the cell_methods attribute, that way people can add the same functionality to homegrown methods/operators.

1reaction

shoyercommented, Aug 26, 2016

I wonder if it is reasonable to return a scalar when there is neither coords nor attrs associated with the return value, or it would be too much ad-hoc thing. For example, in the original example the return value was <xarray.DataArray ()>, which does not have any useful information.

This is a bad path to go down 😃. Now your code might suddenly break when you add a metadata field!

In principle, we could pick some subset of operations for which to always do this and others for which to never do this (e.g., aggregating out all dimensions, but not indexing out all dimensions), but I think this inconsistency would be even more surprising. It’s pretty easy to see how this could lead to bugs, too. At least now you know you always need to type .values or .item()!

Top Results From Across the Web

xarray.DataArray.item

When the data type of a is longdouble or clongdouble, item() returns a scalar array object because there is no available Python scalar...

Making DataArray return a float by default instead of a 0 ...

Dataset) return another DataArray (resp. Dataset) object. In particular, operations returning scalar values (e.g. indexing or aggregations ...

xarray.DataArray.item — xarray 0.12.2 documentation

When the data type of a is longdouble or clongdouble, item() returns a scalar array object because there is no available Python scalar...

Scalars — NumPy v1.24 Manual

Scalars#. Python defines only one type of a particular data class (there is only one integer type, ... However, a void scalar rather...

vtkDataArray Class Reference - VTK

vtkDataArray vtkGenericDataArray < vtkPeriodicDataArray < Scalar > ... The range of the data array values will be returned in the provided range array ......