Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proposal: support for closeness comparison

See original GitHub issue

Due to the limitations of floating point arithmetic, comparing floating point values for bitwise equality is only required in very few situations. In usual sitatuations, for example comparing the output of a function against an expected result, it has thus become best practice to compare the values for closeness rather than equality. Python added built-in support for closeness comparisons (math.isclose) with PEP485 which was introduced in Python 3.5.

With this I’m proposing to add an elementwise isclose operator:

def isclose(x1, x2, *, rtol: float, atol: float):
    pass

Similar to equal, x1 and x2 as well as the return value are arrays. The returned array will be of type bool.

The relative tolerance rtol and absolute tolerance atol should have default values which are discussed below.

Status quo

All actively considered libraries already at least partially support closeness comparisons. In addition to the elementwise isclose operation, usually also allclose is defined. Since allclose(a, b) == all(isclose(a, b)) and all is already part of the standard, I don’t think adding allclose is helpful. Otherwise, we would also need to consider allequal and so on.

Library	`isclose`	`allclose`
NumPy	`numpy.isclose`	`numpy.allclose`
TensorFlow	`tensorflow.experimental.numpy.isclose`	`tensorflow.experimental.numpy.allclose`
PyTorch	`torch.isclose`	`torch.allclose`
MXNet		`mxnet.contrib.ndarray.allclose`
JAX	`jax.numpy.isclose`	`jax.numpy.allclose`
Dask	`dask.array.isclose`	`dask.array.allclose`
CuPy	`cupy.isclose`	`cupy.allclose`

Closeness definition

All the libraries above define closeness like this:

abs(actual - expected) <= atol + rtol * abs(expected)

PEP485 states about this:

In this approach, the absolute and relative tolerance are added together, rather than the or method used in [math.isclose]. This is computationally more simple, and if relative tolerance is larger than the absolute tolerance, then the addition will have no effect. However, if the absolute and relative tolerances are of similar magnitude, then the allowed difference will be about twice as large as expected. […] Even more critically, if the values passed in are small compared to the absolute tolerance, then the relative tolerance will be completely swamped, perhaps unexpectedly.

math.isclose overcomes this and additionally is symmetric:

abs(actual - expected) <= max(atol, rtol * max(abs(actual, expected)))

Thus, in addition to adding the isclose operator, I think it should stick to the objectively better definition of math.isclose.

Non-finite numbers

In addition to finite numbers, the standard should also define how non-finite numbers (NaN, inf, and -inf) are to be handled. Again, I propose to stick to the rationale of PEP485, which in turn is based on IEEE 754:

NaN is never close to anything. All library implementations add a equal_nan: bool = False flag to the functions. If True two NaN values are considered close. Still, comparison between any other value and a NaN is never considered close.
inf, and -inf are only close to themselves.

Default tolerances

In addition to fixed default values (math.isclose: rtol=1e-9, atol=0.0, all libraries: rtol=1e-5, atol=1e-8) the default tolerances could also be varied by the promoted dtype. For example, arrays of dtype float64 could use stricter default tolerances as float32.

For integer dtypes, I propose using rtol = atol = 0.0 which would be identical to comparing them for equality. For floating point dtypes I would use the rationale of PEP485 as base:

rtol: Approximately half the precision of the promoted dtype
atol: 0.0

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:13 (8 by maintainers)

Top GitHub Comments

3reactions

rgommerscommented, Apr 26, 2021

Thanks @pmeier!

equal_nan is quite useful for testing. There you typically compare against expected values; if the expected result of some function call is [1.5, 2.5, nan] then clearly you want equal_nan=True otherwise you have to special-case nan’s everywhere.

For library code though, it’s typically the opposite - you want the regular IEEE 754 behavior where nans are not equal.

That does of course make it a little questionable to have equal_nan in isclose/allclose rather than only in testing functions like assert_allclose. Maybe for statistical algorithms like the hypothesis tests in scipy.stats it still makes sense.

1reaction

pmeiercommented, May 18, 2021

To investigate the impact of the proposed change, I’ve implemented it for numpy and run the numpy, scipy, and scikit-learn test suite against this patched version.

numpy

Only a single test fails with respect to the numerics (there are < 10 failures for tests of the test function that break due to my PoC implementation):

python runtests.py \
    -t numpy/random/tests/test_generator_mt19937.py::TestMultivariateHypergeometric::test_typical_cases \
    -- \
    -k "150000-count-8"

E           Not equal to tolerance rtol=0.001, atol=0.005
E           
E           Mismatched elements: 1 / 4 (25%)
E           Max absolute difference: 0.00528
E           Max relative difference: 0.00228
E            x: array([1.330967, 0.665147, 2.671947, 3.33194 ])
E            y: array([1.333333, 0.666667, 2.666667, 3.333333])

Since rtol and atol have a similar magnitude, the addition of both tolerances made this test pass. Individually the maximum absolute difference is greater than atol (0.00528 > 0.005) and the maximum relative difference is greater than rtol (0.00228 > 0.001).

scipy

Internally scipy relies at one point on the asymmetry of np.isclose. I think this is not intended, but I opened scipy/scipy#14081 to make sure. After patching this we have 10 failing tests left with 3 of them again related to the PoC implementation. 3 fall same category as the failing numpy test:

python runtests.py \
    -t scipy.optimize.tests \
    -- \
    -k "test_L6 or test_bug_7237 or test_singleton_row_ub_2"

4 more tests have very strict tolerances which are manually set:

python runtests.py \
    -t scipy \
    -- \
    -k "test_bug_10124 or test_bisect or test_simple_exact or test_check_finite"

For example:

E   Not equal to tolerance rtol=1e-08, atol=1e-08
E   converged to an unexpected solution
E   Mismatched elements: 1 / 2 (50%)
E   Max absolute difference: 1.09516519e-07
E   Max relative difference: 1.09516519e-08
E    x: array([10., -3.])
E    y: array([10, -3])

scikit-learn

No failing tests.

IMO, this means two things:

The number of tests we would be breaking by going for this change is fairly low if one recalls that each test suite comprises > 10k tests.
Most of test that would fail shouldn’t have passed in the first place, since the checks for maximum and relative tolerance would both fail individually and only pass because the sum of the tolerances is checked.