question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

groupby_bins: exclude bin or assign bin with nan when bin has no values

See original GitHub issue

When using groupby_bins there are cases where no values are found for some of the bins specified. Currently, it appears that in these cases, the bin is skipped, with no value neither a bin entry added to the output dataarray.

Is there a way to identify which bins have been skipped. Or preferably, is it possible to have an option to include those bins, but with nan values. This would make comparing two dataarrays easier in cases where despite the same bin intervals as inputs, the outputs result in dataarrays with different variable and coordinates lengths.

import xarray as xr
var = xr.open_dataset('c:\\users\\saveMWE.nc')
pop = xr.open_dataset('c:\\users\\savePOP.nc')
# binns includes very small bin to test this
binns = [-100, -50, 0, 50, 50.00001, 100]
binned = pop.p2010T.groupby_bins(var.EnsembleMean, binns).sum()
print binned
print binned.EnsembleMean_bins

In this case, no data falls in the 4th bin between 50 and 50.00001.

<xarray.DataArray 'p2010T' (EnsembleMean_bins: 4)>
array([  2.64352214e+09,   3.46869168e+09,   3.08998110e+08,
         1.48247440e+07])
Coordinates:
  * EnsembleMean_bins  (EnsembleMean_bins) object '(0, 50]' '(-50, 0]' ...
<xarray.DataArray 'EnsembleMean_bins' (EnsembleMean_bins: 4)>
array(['(0, 50]', '(-50, 0]', '(51, 100]', '(-100, -50]'], dtype=object)

Obviously one can count the lengths but this doesn’t indicate which bin was skipped. An option to include the empty bin with a nan value would be useful! Thanks

bins_example.zip

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
rabernatcommented, Sep 29, 2016

As for the empty bins, I can see how this would be useful. I suppose it is a bug. Curious what @shoyer thinks about this case…

0reactions
byersiiasacommented, Oct 3, 2016

@rabernat @shoyer thank you very much - (at least for my purposes) this appears to be working well.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to bin values, while ignoring others that fulfill a certain ...
So I want to bin on every column while excluding the negative values. So far qcut on [df>0] puts Nan where previously there...
Read more >
4 Pandas GroupBy Tricks You Should Know | Medium
The answer is NO. If a variable is continuous, what we need to do is just creating bins to make sure they are...
Read more >
pandas GroupBy: Your Guide to Grouping Data in Python
This tutorial assumes that you have some experience with pandas itself, including how ... pandas Categorical array: df.groupby(bins.values).
Read more >
pandas.cut — pandas 1.5.2 documentation
Use cut when you need to segment and sort data values into bins. This function is also useful ... Indicates whether bins includes...
Read more >
Group data into bins or categories - MATLAB discretize
[___] = discretize(___, values ) returns the corresponding element in values rather than the bin number, using any of the previous input or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found