question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Iterating over a Dataset iterates only over its data_vars

See original GitHub issue

This has been a small-but-persistent issue for me for a while. I suspect that my perspective might be dependent on my current outlook, but socializing it here to test if it’s secular…

Currently Dataset.keys() returns both variables and coordinates (but not its attrs keys):

In [5]: ds=xr.Dataset({'a': (('x', 'y'), np.random.rand(10,2))})
In [12]: list(ds.keys())
Out[12]: ['a', 'x', 'y']

Is this conceptually correct? I would posit that a Dataset is a mapping of keys to variables, and the coordinates contain values that label that data.

So should Dataset.keys() instead return just the keys of the Variables?

We’re often passing around a dataset as a Mapping of keys to values - but then when we run a function across each of the keys, we get something run on both the Variables’ keys, and the Coordinate / label’s keys.

In Pandas, DataFrame.keys() returns just the columns, so that conforms to what we need. While I think the xarray design is in general much better in these areas, this is one area that pandas seems to get correct - and because of the inconsistency between pandas & xarray, we’re having to coerce our objects to pandas DataFrames before passing them off to functions that pull out their keys (this is also why we can’t just look at ds.data_vars.keys() - because it breaks that duck-typing).

Does that make sense?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
shoyercommented, May 25, 2018

Do we need to change the behavior of dict(dataset) so that dict(dataset).keys() and dataset.keys() become consistent?

No, I think these are guaranteed to be consistent because we inherit from collections.Mapping to implement dict methods like keys(), values() and items() (via __iter__ and __getitem__).

1reaction
fujiisoupcommented, May 25, 2018

Do we need to change the behavior of dict(dataset) so that dict(dataset).keys() and dataset.keys() become consistent?

Read more comments on GitHub >

github_iconTop Results From Across the Web

loop through dataArray attributes in an xarray dataset
data_vars is a dictionary that you can iterate over key,value pairs. for varname, da in dataSet.data_vars.items(): print(da.attrs).
Read more >
Loop or Iterate over all or certain columns of a dataframe in ...
Method #5: Using index (iloc) : To iterate over the columns of a Dataframe by index we can iterate over a range i.e....
Read more >
How to iterate over rows in Pandas: Most efficient options
There are many ways to iterate over rows of a DataFrame or Series in pandas, each with their own pros and cons. Since...
Read more >
For Loops in Python Tutorial: How to iterate over ... - DataCamp
Learn how to iterate over Pandas Dataframe rows and columns with Python for loops. Follow step-by-step code examples today!
Read more >
21 Iteration | R for Data Science - Hadley Wickham
As your needs change, you only need to make changes in one place, ... This determines what to loop over: each run of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found