Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

v1.23.0: np.unique returns incorrect result for float16 array containing NaNs

See original GitHub issue

This only seems to affect float16 arrays (float32 and float64 are fine), and only if the array contains NaN values.

Reproduction:

import numpy as np
x = np.array([0, 1, np.nan], dtype='float16')
print(np.__version__)
print(x)
print(np.unique(x))

Output in numpy 1.22.4:

1.22.4
[ 0.  1. nan]
[ 0.  1. nan]

Output in numpy 1.23.0

1.23.0
[ 0.  1. nan]
[0.]

Issue Analytics

State:
Created a year ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

postmalloccommented, Jun 24, 2022

@seberg I added some test cases specifically for float16 and float32; includes the test case you shared earlier.

The underlying issue seems that np.searchsorted is broken with NaN for float16, I am not quite sure why… Maybe it happened as part of the C++ conversions?\

EDIT:
x = np.array([0, 1, np.nan], dtype='float16')
np.searchsorted(x, x[-1])
# Should return 2 but returns 0

1reaction

postmalloccommented, Jun 23, 2022

I’ve taken a quick look. It seems the type_num for np.float16 is 23 according to the NPY_TYPES enum sequence. However, the taglist in binarysearch has npy::half_tag at a different position (at index 11). The indices don’t match, and it never enters HALF_LT as you observed.

Top Results From Across the Web

Unexpected numpy.unique behavior - python - Stack Overflow

I am using numpy.unique to get values, indices and counts on a masked array that has been flattened with numpy.ravel and am getting...

Data types — NumPy v1.24 Manual

NumPy supports a much greater variety of numerical types than Python does. This section shows which are available, and how to modify an...

What's New — pandas 0.20.3 documentation

Bug in Float64Index causing an empty array instead of None to be returned from .get(np.nan) on a Series whose index did not contain...

Chapter 4. NumPy Basics: Arrays and Vectorized Computation

Here are some of the things it provides: ndarray , a fast and space-efficient multidimensional array providing vectorized arithmetic operations and ...

Remove rows/columns with missing value (NaN) in ndarray

To remove rows and columns containing missing values NaN in NumPy array numpy.ndarray, check NaN with np.isnan() and extract rows and ...