Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

maximum_filter is not symetrical with nans

See original GitHub issue

When applying maximum_filter to an array containing nans, the result is not symmetrical. A nan on the right is not the same as a nan on the left.

Reproducing code example:

import numpy as np
from scipy.ndimage.filters import maximum_filter
a = np.ones(17)*np.nan
a[:5] = np.arange(5)
a[:-6:-1] = np.arange(5)
print(maximum_filter(a, 3))

Error message:

[ 1.  2.  3.  4.  4.  4. nan nan nan nan nan nan nan  4.  3.  2.  1.]

Scipy/Numpy/Python version information:

1.0.1 1.14.2 sys.version_info(major=3, minor=6, micro=5, releaselevel='final', serial=0)

Issue Analytics

State:
Created 5 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

rkerncommented, Jun 15, 2021

You would interested in scikit-image, where most (if not all) of the filters take a boolean mask parameter that can be used for this purpose instead of NaNs.

https://scikit-image.org/docs/dev/api/skimage.filters.rank.html#skimage.filters.rank.maximum

1reaction

rkerncommented, Jun 15, 2021

NaNs have always existed. We have no functions that were “written before NaNs”. The behaviors that you see are just the behaviors of NaNs in these functions according to IEEE-754 semantics. What you are asking for is to add different semantics onto NaNs. Which can often be a fine thing if done consistently, but is generally a feature request and not a bug fix.

I would say that the current behavior is correct, or at least is one way to be correct. Adding “missingness” semantics (i.e. the result is a NaN if and only if all of the elements in the window are NaNs) would also be a “correct” way to implement this, if documented appropriately and done consistently across ndimage filters. Another “correct” way would be to add “always-propagate” semantics to the maximum (i.e. the result is a NaN if any elements in the window are NaNs). If we had to make a change, I’d take the last one as closest to IEEE-754 semantics (i.e. defining a maximum(a, b) as a NaN-propagating floating-point function in its own right rather than a simple if-then-else), closer to the behavior of other ndimage filters in the face of NaNs, and less of a performance hit. See, for instance, how uniform_filter() deals with the NaNs. None of these cases would give us the output that you cite as the expected answer.
As I don’t see it as mathematically wrong, I think it’s a (reasonable, but not automatic) feature request, not a bug-fix.
I don’t particularly agree that the direction you want to go is any more perfect. It may well be more aligned with your particular use cases, but the current behavior (at least, the performance) is more aligned to others’ use cases.
We have many filters for which order is important. Even in the absence of NaNs, floating point arithmetic makes many operations asymmetric that one would expect to be symmetric from our understanding of real numbers (~a+b!=b+a~ (a+(b+c))!=((a+b)+c) in general [edit: sorry, it was associativity, not commutativity of addition that fails]). This renders certain filters that we think of as order-independent (when we apply real-number semantics) order-dependent when applied to floating point numbers. In any case, AFAICT, maximum_filter(a, n) returns the same thing for any n-window that is in the same order, which is about what we can expect of it.

If there is a change to be made, I would suggest moving to the “always-propagate” semantics. That is, if there is any NaN in the window, then a NaN is output. This has the effect of dilating the NaN regions, which isn’t always desired, but it is the way the non-comparison-based filters behave, and is a reasonable way to define the filter with respect to IEEE-754 semantics.

I would also be amenable to briefly documenting (but not warning or raiseing an exception) the behavior with respect to NaNs, because it is an odd corner case that I wouldn’t expect anyone to guess the behavior unless if they were steeped in IEEE-754 lore. That said, NaNs are weird, and if you are going to use them, it pays to understand just how weird they can be; the weirdness of those behaviors are usually not bugs in scipy.