Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent behavior when DataFrame with strings and None is created from lists or dictionary.

See original GitHub issue

Code Sample

import pandas as pd


pd.__version__

# df created from list, None is casted to str
df = pd.DataFrame(["1", "2", None], columns=["a"], dtype="str")
type(df.loc[2].values[0])


# Equivalent df created from dict, None remains NoneType
df = pd.DataFrame({"a": ["1", "2", None]}, dtype="str")
type(df.loc[2].values[0]))

Problem description

None is not casted consistently for DataFrames with None values and dtype set to str.

If DataFrame is created from list, then None is casted to str -> None -> “None”. If DataFrame is created from dict, then None remains NoneType -> None -> None.

IMO, the latter is the preferred behavior. I hope you consider this when continuing your work on string types and na types in future versions.

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.7.5.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None

pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 41.2.0 Cython : None pytest : 5.3.5 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.12.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : 5.3.5 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

State:
Created 4 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

prakhar987commented, Feb 25, 2020

I am assuming one is wrong, the case where None becomes “None”.

0reactions

cgarciaecommented, Apr 12, 2021

take

Top Results From Across the Web

Pandas transform inconsistent behavior for list - Stack Overflow

I've come across a similar issue before. The underlying issue I think is when the number of elements in the list matches the...

What's new in 1.4.0 (January 22, 2022) - Pandas

These are bug fixes that might have notable behavior changes. Inconsistent date string parsing#. The dayfirst option of to_datetime() isn't strict ...

Using Pandas and Python to Explore Your Dataset

First is a familiarity with Python's built-in data structures, especially lists and dictionaries. For more information, check out Lists and Tuples in Python ......

Pandas Convert List of Dictionaries to DataFrame

Below are quick example # Create a list of dictionary objects ... like to change the NaN values refer to How to replace...

[Solved]-Pandas to_datetime has inconsistent behavior on ...

Coding example for the question Pandas to_datetime has inconsistent behavior on non-american dates-Pandas,Python.