What should `Dataset.count` return for missing dims?
See original GitHub issueWhat is your issue?
When using a dataset with multiple variables and using Dataset.count("x")
it will return ones for variables that are missing dimension “x”, e.g.:
import xarray as xr
ds = xr.Dataset({"a": ("x", [1, 2, 3]), "b": ("y", [4, 5])})
ds.count("x")
# returns:
# <xarray.Dataset>
# Dimensions: (y: 2)
# Dimensions without coordinates: y
# Data variables:
# a int32 3
# b (y) int32 1 1
I can understand why “1” can be a valid answer, but the result is probably a bit philosophical.
For my usecase I would like it to return an array of ds.sizes["x"]
/ 0. I think this is also a valid return value, considering the broadcasting rules, where the size of the missing dimension is actually known in the dataset.
Maybe one could make this behavior adjustable with a kwarg, e.g. "missing_dim_value: {int, “size”}, default 1.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
xarray.Dataset.count — xarray 0.11.2 documentation
By default count is applied over all dimensions. skipna : bool, optional. If True, skip missing values (as marked by NaN). By default,...
Read more >Count days even if your data is missing days for the dimension
I can get dimension 12345 to return the correct number of days by summing INT({Fixed [Row_Date]:MIN([Is Business Day])}). However for dimension ...
Read more >A Comprehensive guide on handling Missing Values - Medium
These missing values in the data are to be handled properly. ... We can view count, mean, median, max ..etc, of each numerical...
Read more >How to calculate number of missing values summed over time ...
Here ncap2 chains two methods together, missing(), followed by a total over the time dimension. The 2D variable mss_val is in out.nc. The ......
Read more >Count the number of missing values for each variable
The easy case: Count missing values for numeric variables ... You can also write the cnt matrix to a data set, if necessary: ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
We discussed:
b
alongx
For the other reductions
gives
I guess the output for nansum, nanprod doesn’t match what you would get by broadcasting along the absent dimension.
Another option is to add an option:
missing_dim
: “raise”, ignore" or “broadcast”. The default then would be ignore, which is the current implementation.But for workflows of variables that are either DataArray or Dataset, this argument should be added to
DataArray.sum/count/prod
as well?