Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

REGR: Series repr of object Index with bools and NaN is wrong

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
# a series with booleans and nan
pd.Series([False,True,True,pd.NA]).value_counts(dropna=False)
True     2
False    1
True     1
dtype: int64

# check the actual index of the value_counts result
pd.Series([False,True,True,pd.NA]).value_counts(dropna=False).index
Index([True, False, nan], dtype='object')

# a similar Serie, with ints instead of nans, seems to work ok
pd.Series([0,1,1,pd.NA]).value_counts(dropna=False)
1.0    2
0.0    1
NaN    1
dtype: int64

Problem description

As shown in the example, the repr of the value_counts result is apparently wrong when booleans are in the serie. It should report nan as a possible value, instead it maps it to True. Note that this is limited to series with boolean. A similar example with ints works ok.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : None python : 3.7.1.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-76-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.2.0 Cython : None pytest : 5.3.5 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : None IPython : 7.12.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 5.3.5 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.13 tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

State:
Created 4 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

jorisvandenbosschecommented, Feb 21, 2020

And the bug should be somewhere in here:

In [1]: idx = pd.Index([True, False, np.nan], dtype=object) 

In [2]: idx.format() 
Out[2]: ['True ', 'False', 'False']

1reaction

ts2095commented, Feb 21, 2020

Not sure if this is related but it seems similar:

numpy.nan shows as True in index.

>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(range(3), index=[True, False, np.nan])
>>> s
True     0
False    1
True     2
dtype: int64

This True is different from the regular True in the value count:

>>> s.index.value_counts(dropna=False)
True     1
False    1
True     1
dtype: int64

But the correct value is still used “under the hood”:

>>> s.index.unique()
Index([True, False, nan], dtype='object')

Top Results From Across the Web

How can I map True/False to 1/0 in a Pandas DataFrame?

A succinct way to convert a single column of boolean values to a column of ... Series: """Convert the boolean to binary representation,...

Essential basic functionality — pandas 1.5.2 documentation

pandas objects ( Index , Series , DataFrame ) can be thought of as containers for arrays, which hold the actual data and...

Working with missing values in Pandas - Towards Data Science

In the sentinel value approach, a tag value is used for indicating the missing value, such as NaN (Not a Number), null or...

Pandas Training – Building Skills for Data Science

iterrows() : Iterate over the rows of a DataFrame as (index, Series) pairs. This converts the rows to Series objects, which can change...

The pandas DataFrame Object - Cheat Sheet

df to represent a pandas DataFrame object; ... aligning the Series index for each of the operands. ... df['new_col'] = np.repeat(np.nan,len(df)).

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

REGR: Series repr of object Index with bools and NaN is wrong

Code Sample, a copy-pastable example if possible

Problem description

Output of `pd.show_versions()`

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

False positive for "nested renamer is not supported" error

Series with NAMED period index raise error on groupby index.month (pandas 1.0 specific)

REGR: Series repr of object Index with bools and NaN is wrong

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

False positive for "nested renamer is not supported" error

Series with NAMED period index raise error on groupby index.month (pandas 1.0 specific)

Output of `pd.show_versions()`