BUG: searchsorted with object arrays containing nan
See original GitHub issueimport sys
import numpy as np
arr = np.array([1, 2, 3, 4, 5], dtype=object)
arr[::2] = np.nan
print(arr)
# [nan 2 nan 4 nan]
bins = np.array([1, 3, 5])
# Inserts into same position (incorrect)
print(bins.searchsorted(arr))
# array([0, 1, 0, 1, 0])
# Now inserts into different positions (correct)
print(bins.searchsorted(arr.astype(float)))
# array([3, 1, 3, 2, 3])
np.__version__
# '1.18.1'
sys.version
# '3.7.4 (default, Aug 13 2019, 15:17:50) \n[Clang 4.0.1 (tags/RELEASE_401/final)]'
In the example here searchsorted
assigns both 2 and 4 from arr
to the same index within bins
, but 2 should go to 1 and 4 to 2. The output is correct when we cast bins
to float instead of object. Also, for whatever reason I can only reproduce this issue when the NaN
values are evenly spaced.
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (9 by maintainers)
Top Results From Across the Web
Numpy nanmean and dataframe (possible bug?)
I don't have the answers, but it mostly seems that entire pandas DataFrames can be elements of numpy arrays, which results in strange...
Read more >numpy.searchsorted — NumPy v1.24 Manual
Binary search is used to find the required insertion points. As of NumPy 1.4.0 searchsorted works with real/complex arrays containing nan values. The...
Read more >pandas.Series — pandas 1.5.2 documentation
The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods ...
Read more >Release Notes — NumPy v1.9 Manual
All non-integer array-likes are deprecated, object arrays of custom integer like ... Searchsorted now works with sorted arrays containing nan values.
Read more >Release Notes — NumPy v1.15 Manual
#11661: BUG: Warn on Nan in minimum,maximum for scalars ... Currently, this writeback occurs when the array objects are garbage collected, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ah sorry, I misread. The bins are sorted and have no NaN, so except for the NaN values the result should stay the same:
The insert place of the NaN is ill-defined, but for the other values it should probably not be. This is probably an optimization in the search (i.e. to make searching for close values faster). Maybe this optimization should be disabled for object arrays.
if
NaN
can not check explicitly, maybe the only way to make the result more accurate is to remove optimization algorithm in the search. at least, we can enable the number object get the correct index. for example.