question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Return a scalar instead of DataArray when the return value is a scalar

See original GitHub issue

Hi,

I’m not sure how devs will feel about this, but I wanted to ask because I’m getting into this issue frequently.

Currently many methods such as .min(), .max(), .mean() returns a DataArray even for the cases where the return value is a scaler. For example,

import numpy as np
import xarray as xr
test = xr.DataArray(data=np.ones((10, 10)))

In [6]: test.min()
Out[6]: 
<xarray.DataArray ()>
array(1.0)

which makes a lot of other things break down and I have to use test.min().values or float(test.min()). I think it would be great that these methods return a scalar when the return value is a scaler. For example,

In [7]: np.ones((10, 10)).mean()
Out[7]: 1.0

Thank you!

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
darothencommented, Aug 27, 2016

@joonro, I think there’s a strong case to be made about returning a DataArray with some metadata appended. Referring to the latest draft of the CF Metadata Conventions, there is a clear way to indicate when operations such as mean, max, or min have been applied to a variable by using the cell_methods attribute.

It might be more prudent to add this attribute whenever we apply these operations to a DataArray (or perhaps variable-wise when applied to a Dataset). That way, there is a clear reason to not return a scalar - the documentation of what operations were applied to produce that final result.

I can whip up a working example/pull request if people think this is a direction to go. I’d probably build a decorator which handles inspection of the operator name and arguments and uses that to add the cell_methods attribute, that way people can add the same functionality to homegrown methods/operators.

1reaction
shoyercommented, Aug 26, 2016

I wonder if it is reasonable to return a scalar when there is neither coords nor attrs associated with the return value, or it would be too much ad-hoc thing. For example, in the original example the return value was <xarray.DataArray ()>, which does not have any useful information.

This is a bad path to go down 😃. Now your code might suddenly break when you add a metadata field!

In principle, we could pick some subset of operations for which to always do this and others for which to never do this (e.g., aggregating out all dimensions, but not indexing out all dimensions), but I think this inconsistency would be even more surprising. It’s pretty easy to see how this could lead to bugs, too. At least now you know you always need to type .values or .item()!

Read more comments on GitHub >

github_iconTop Results From Across the Web

xarray.DataArray.item
When the data type of a is longdouble or clongdouble, item() returns a scalar array object because there is no available Python scalar...
Read more >
Making DataArray return a float by default instead of a 0 ...
Dataset) return another DataArray (resp. Dataset) object. In particular, operations returning scalar values (e.g. indexing or aggregations ...
Read more >
xarray.DataArray.item — xarray 0.12.2 documentation
When the data type of a is longdouble or clongdouble, item() returns a scalar array object because there is no available Python scalar...
Read more >
Scalars — NumPy v1.24 Manual
Scalars#. Python defines only one type of a particular data class (there is only one integer type, ... However, a void scalar rather...
Read more >
vtkDataArray Class Reference - VTK
vtkDataArray vtkGenericDataArray < vtkPeriodicDataArray < Scalar > ... The range of the data array values will be returned in the provided range array ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found