pd.NA TypeError in drop_duplicates with object dtype
See original GitHub issueCode Sample, a copy-pastable example if possible
>>> pd.DataFrame([[1, pd.NA], [2, "a"]]).drop_duplicates()
Traceback (most recent call last):
...
File "/Users/williamayd/miniconda3/envs/sitka/lib/python3.8/site-packages/pandas/core/frame.py", line 4859, in f
labels, shape = algorithms.factorize(
File "/Users/williamayd/miniconda3/envs/sitka/lib/python3.8/site-packages/pandas/core/algorithms.py", line 629, in factorize
codes, uniques = _factorize_array(
File "/Users/williamayd/miniconda3/envs/sitka/lib/python3.8/site-packages/pandas/core/algorithms.py", line 478, in _factorize_array
uniques, codes = table.factorize(values, na_sentinel=na_sentinel, na_value=na_value)
File "pandas/_libs/hashtable_class_helper.pxi", line 1806, in pandas._libs.hashtable.PyObjectHashTable.factorize
File "pandas/_libs/hashtable_class_helper.pxi", line 1728, in pandas._libs.hashtable.PyObjectHashTable._unique
File "pandas/_libs/missing.pyx", line 360, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
This same failure isn’t present when using an extension type:
>>> df = pd.DataFrame([[1, pd.NA], [2, "a"]], columns=list("ab"))
>>> df["b"] = df["b"].astype("string")
>>> df.drop_duplicates()
a b
0 1 <NA>
1 2 a
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
TypeError: type object argument after * must be a sequence ...
Is it possible to use the drop_duplicates method in Pandas to remove duplicate rows based on a column id where the values contain...
Read more >Pandas Drop Duplicates – pd.df.drop_duplicates()
Pandas Drop Duplicates - .drop_duplicates() looks through your DataFrame and drops any duplicate rows or rows with duplicate column subsets.
Read more >pandas.DataFrame.drop_duplicates
Return DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes are ignored. Parameters. subsetcolumn ...
Read more >pandas.core.series — Lux 0.1.2 documentation
The object supports both integer- and label-based indexing and provides a ... s = pd.Series([1, 2, 3], dtype=np.int64, name='Numbers') >>> s 0 1...
Read more >Manipulating DataFrames with Pandas - Trenton McKinney
import pandas as pd import matplotlib.pyplot as plt import numpy as np from ... 1308 non-null float64 9 embarked 1307 non-null object dtypes:...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I cannot reproduce the error. Has it been fixed already?
fixed in #31939 (i.e. 1.0.2)
41bc226841eb59ccdfa279734dac98f7debc6249 is the first new commit commit 41bc226841eb59ccdfa279734dac98f7debc6249 Author: Daniel Saxton 2658661+dsaxton@users.noreply.github.com Date: Sun Feb 23 08:57:07 2020 -0600