question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Quantity objects with nan values get duplicated in sets

See original GitHub issue

When creating a set out of multiple numpy.nan objects, it will return a set with only one numpy.nan object within it.

>>> from numpy import nan
>>> {nan, nan}
{nan}

However, if we try to create a set out of multiple Quantity objects that have the same units but values of numpy.nan, then the set will contain multiple Quantity objects. (Here I chose attojansky as the funniest unit I could think of, in lieu of #5927.)

>>> from astropy.units import attojansky
>>> {nan * attojansky, nan * attojansky}
{<Quantity nan aJy>, <Quantity nan aJy>}

I would have expected that the result be a set containing only one Quantity: {<Quantity nan aJy>}.

One possibility would be to make it so that a Quantity with a value of numpy.nan be equal to itself. However, numpy.nan does not equal itself, so this change could lead to some inconsistencies or unexpected behavior.

>>> nan == nan
False

[This also reminds me of the quote from The Prisoner…“I an NaN, I am a free man!”]

Cross-references: #5420, #5424

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
MSeifert04commented, Jun 21, 2019

Using NaN in sets/dicts is more or less undefined behavior because these data structures rely on equality (but np.nan != np.nan) but on the other hand some NaN instances reference the same instance (np.nan is np.nan). It’s not really a bug, that’s just how Python (or better CPython) works:

>>> {np.nan, np.nan}  # same instance
{nan}

>>> set(np.array([np.nan, np.nan]))  # different instances when iterated over
{nan, nan}

It essentially boils down to the implementation detail that CPython uses PyObject_RichCompareBool (which is True for same instances) in some places instead of using plain == (which isn’t True for NaNs, never).

1reaction
mhvkcommented, Jun 21, 2019

I’m not sure what to do about this, as a similar problem also occurs with ndarray:

a = np.array(nan)
{a, a}
# TypeError: unhashable type: 'numpy.ndarray'

Indeed, the above suggests that Quantity should not be hashable in the first place (since it can be changed in-place).

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to keep one of the duplicate rows which doesn't have a ...
You can use: #specify columns for test dupe values and add keep=False for return all dupes, invert by ~ m1 = ~data.duplicated(subset=['Loan ...
Read more >
How to Find Duplicate Values in SQL - LearnSQL.com
Duplicate records waste time, space, and money. Learn how to find and fix duplicate values using SQL's GROUP BY and HAVING clauses.
Read more >
Finding and removing duplicate rows in Pandas DataFrame
Finding duplicate rows. To find duplicates on a specific column, we can simply call duplicated() method on the column. >>> df.
Read more >
pandas: Find and remove duplicate rows of DataFrame, Series
Use duplicated() and drop_duplicates() to find, extract, count and remove duplicate rows from pandas.DataFrame, pandas.Series.pandas.
Read more >
Determine Duplicate Elements - Rdrr.io
This would most commonly be used to find duplicated rows (the default) or ... Missing values ( "NA" ) are regarded as equal,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found