question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: searchsorted with object arrays containing nan

See original GitHub issue
import sys
import numpy as np

arr = np.array([1, 2, 3, 4, 5], dtype=object)
arr[::2] = np.nan

print(arr)
# [nan 2 nan 4 nan]

bins = np.array([1, 3, 5])

# Inserts into same position (incorrect)
print(bins.searchsorted(arr))                                                                                                        
# array([0, 1, 0, 1, 0])

# Now inserts into different positions (correct)
print(bins.searchsorted(arr.astype(float)))
# array([3, 1, 3, 2, 3])

np.__version__
# '1.18.1'

sys.version
# '3.7.4 (default, Aug 13 2019, 15:17:50) \n[Clang 4.0.1 (tags/RELEASE_401/final)]'

In the example here searchsorted assigns both 2 and 4 from arr to the same index within bins, but 2 should go to 1 and 4 to 2. The output is correct when we cast bins to float instead of object. Also, for whatever reason I can only reproduce this issue when the NaN values are evenly spaced.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:10 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
sebergcommented, Feb 3, 2020

Ah sorry, I misread. The bins are sorted and have no NaN, so except for the NaN values the result should stay the same:

arr = np.array([1, 2, 3, 4, 5, 6, 7], dtype=object)
arr[::2] = np.nan
bins = np.array([1, 3, 5])
print("Should return identical results, since only the keys change:")
print(bins.searchsorted(arr)[1::2])
print(bins.searchsorted(arr[1::2]))

The insert place of the NaN is ill-defined, but for the other values it should probably not be. This is probably an optimization in the search (i.e. to make searching for close values faster). Maybe this optimization should be disabled for object arrays.

0reactions
Qiyu8commented, Apr 14, 2020

if NaN can not check explicitly, maybe the only way to make the result more accurate is to remove optimization algorithm in the search. at least, we can enable the number object get the correct index. for example.

arr = np.array([np.nan, 2, np.nan, 4, np.nan], dtype=object)
bins = np.array([1, 3, 5])
print(bins.searchsorted(arr)) 
# array([0, 1, 0, 2, 0])
Read more comments on GitHub >

github_iconTop Results From Across the Web

Numpy nanmean and dataframe (possible bug?)
I don't have the answers, but it mostly seems that entire pandas DataFrames can be elements of numpy arrays, which results in strange...
Read more >
numpy.searchsorted — NumPy v1.24 Manual
Binary search is used to find the required insertion points. As of NumPy 1.4.0 searchsorted works with real/complex arrays containing nan values. The...
Read more >
pandas.Series — pandas 1.5.2 documentation
The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods ...
Read more >
Release Notes — NumPy v1.9 Manual
All non-integer array-likes are deprecated, object arrays of custom integer like ... Searchsorted now works with sorted arrays containing nan values.
Read more >
Release Notes — NumPy v1.15 Manual
#11661: BUG: Warn on Nan in minimum,maximum for scalars ... Currently, this writeback occurs when the array objects are garbage collected, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found