question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What should `Dataset.count` return for missing dims?

See original GitHub issue

What is your issue?

When using a dataset with multiple variables and using Dataset.count("x") it will return ones for variables that are missing dimension “x”, e.g.:

import xarray as xr
ds = xr.Dataset({"a": ("x", [1, 2, 3]), "b": ("y", [4, 5])})
ds.count("x")
# returns:
# <xarray.Dataset>
# Dimensions:  (y: 2)
# Dimensions without coordinates: y
# Data variables:
#     a        int32 3
#     b        (y) int32 1 1

I can understand why “1” can be a valid answer, but the result is probably a bit philosophical.

For my usecase I would like it to return an array of ds.sizes["x"] / 0. I think this is also a valid return value, considering the broadcasting rules, where the size of the missing dimension is actually known in the dataset.

Maybe one could make this behavior adjustable with a kwarg, e.g. "missing_dim_value: {int, “size”}, default 1.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
dcheriancommented, Jul 7, 2022

We discussed:

  1. dropping variables without the dimension
  2. Return ds.sizes[“x”] by broadcasting b along x

For the other reductions

import numpy as np
import xarray as xr

from xarray.core.duck_array_ops import count

ds = xr.Dataset({"a": ("x", [1, 2, 3]), "b": ("y", [4, 5])})

for func in [np.nansum, np.nanprod, np.nanmean, np.nanvar, np.nanstd, count]:
    print(f"{func.__name__!s}({ds.b.data}, axis=()) = {func(ds.b.data, axis=())}")

gives

nansum([4 5], axis=()) = [4 5]
nanprod([4 5], axis=()) = [4 5]
nanmean([4 5], axis=()) = [4. 5.]
nanvar([4 5], axis=()) = [0. 0.]
nanstd([4 5], axis=()) = [0. 0.]
count([4 5], axis=()) = [1 1]

I guess the output for nansum, nanprod doesn’t match what you would get by broadcasting along the absent dimension.

0reactions
headtr1ckcommented, Jul 8, 2022

Another option is to add an option: missing_dim: “raise”, ignore" or “broadcast”. The default then would be ignore, which is the current implementation.

But for workflows of variables that are either DataArray or Dataset, this argument should be added to DataArray.sum/count/prod as well?

Read more comments on GitHub >

github_iconTop Results From Across the Web

xarray.Dataset.count — xarray 0.11.2 documentation
By default count is applied over all dimensions. skipna : bool, optional. If True, skip missing values (as marked by NaN). By default,...
Read more >
Count days even if your data is missing days for the dimension
I can get dimension 12345 to return the correct number of days by summing INT({Fixed [Row_Date]:MIN([Is Business Day])}). However for dimension ...
Read more >
A Comprehensive guide on handling Missing Values - Medium
These missing values in the data are to be handled properly. ... We can view count, mean, median, max ..etc, of each numerical...
Read more >
How to calculate number of missing values summed over time ...
Here ncap2 chains two methods together, missing(), followed by a total over the time dimension. The 2D variable mss_val is in out.nc. The ......
Read more >
Count the number of missing values for each variable
The easy case: Count missing values for numeric variables ... You can also write the cnt matrix to a data set, if necessary: ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found