Inconsistent behavior when DataFrame with strings and None is created from lists or dictionary.
See original GitHub issueCode Sample
import pandas as pd
pd.__version__
# df created from list, None is casted to str
df = pd.DataFrame(["1", "2", None], columns=["a"], dtype="str")
type(df.loc[2].values[0])
# Equivalent df created from dict, None remains NoneType
df = pd.DataFrame({"a": ["1", "2", None]}, dtype="str")
type(df.loc[2].values[0]))
Problem description
None is not casted consistently for DataFrames with None values and dtype set to str.
If DataFrame is created from list, then None is casted to str -> None -> “None”. If DataFrame is created from dict, then None remains NoneType -> None -> None.
IMO, the latter is the preferred behavior. I hope you consider this when continuing your work on string types and na types in future versions.
Expected Output
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.7.5.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None
pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 41.2.0 Cython : None pytest : 5.3.5 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.12.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : 5.3.5 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
I am assuming one is wrong, the case where None becomes “None”.
take