question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

REGR: Series repr of object Index with bools and NaN is wrong

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
# a series with booleans and nan
pd.Series([False,True,True,pd.NA]).value_counts(dropna=False)
True     2
False    1
True     1
dtype: int64

# check the actual index of the value_counts result
pd.Series([False,True,True,pd.NA]).value_counts(dropna=False).index
Index([True, False, nan], dtype='object')

# a similar Serie, with ints instead of nans, seems to work ok
pd.Series([0,1,1,pd.NA]).value_counts(dropna=False)
1.0    2
0.0    1
NaN    1
dtype: int64

Problem description

As shown in the example, the repr of the value_counts result is apparently wrong when booleans are in the serie. It should report nan as a possible value, instead it maps it to True. Note that this is limited to series with boolean. A similar example with ints works ok.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.1.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-76-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.2.0 Cython : None pytest : 5.3.5 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : None IPython : 7.12.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 5.3.5 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.13 tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jorisvandenbosschecommented, Feb 21, 2020

And the bug should be somewhere in here:

In [1]: idx = pd.Index([True, False, np.nan], dtype=object) 

In [2]: idx.format() 
Out[2]: ['True ', 'False', 'False']
1reaction
ts2095commented, Feb 21, 2020

Not sure if this is related but it seems similar:

numpy.nan shows as True in index.

>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(range(3), index=[True, False, np.nan])
>>> s
True     0
False    1
True     2
dtype: int64

This True is different from the regular True in the value count:

>>> s.index.value_counts(dropna=False)
True     1
False    1
True     1
dtype: int64

But the correct value is still used “under the hood”:

>>> s.index.unique()
Index([True, False, nan], dtype='object')
Read more comments on GitHub >

github_iconTop Results From Across the Web

How can I map True/False to 1/0 in a Pandas DataFrame?
A succinct way to convert a single column of boolean values to a column of ... Series: """Convert the boolean to binary representation,...
Read more >
Essential basic functionality — pandas 1.5.2 documentation
pandas objects ( Index , Series , DataFrame ) can be thought of as containers for arrays, which hold the actual data and...
Read more >
Working with missing values in Pandas - Towards Data Science
In the sentinel value approach, a tag value is used for indicating the missing value, such as NaN (Not a Number), null or...
Read more >
Pandas Training – Building Skills for Data Science
iterrows() : Iterate over the rows of a DataFrame as (index, Series) pairs. This converts the rows to Series objects, which can change...
Read more >
The pandas DataFrame Object - Cheat Sheet
df to represent a pandas DataFrame object; ... aligning the Series index for each of the operands. ... df['new_col'] = np.repeat(np.nan,len(df)).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found