BUG: pd.NA doesn't pickle/unpickle faithfully
See original GitHub issueCode Sample, a copy-pastable example if possible
In [5]: df['Gold Categories'].count()
Out[5]: 135218
In [6]: df['Gold Categories'].isna().sum()
Out[6]: 0
In [7]: df['Gold Categories'].iloc[256]
Out[7]: <NA>
In [8]: pd.isna(df['Gold Categories'].iloc[256])
Out[8]: False
In [9]: type(df['Gold Categories'].iloc[256])
Out[9]: pandas._libs.missing.NAType
In [10]: pd.__version__
Out[10]: '1.0.1'
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 5.3.16-200.fc30.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : nb_NO.UTF-8 LOCALE : nb_NO.UTF-8
pandas : 1.0.1 numpy : 1.17.3 pytz : 2019.3 dateutil : 2.8.0 pip : 19.3.1 setuptools : 41.6.0.post20191030 Cython : 0.29.13 pytest : 5.2.2 hypothesis : None sphinx : 2.2.1 blosc : None feather : None xlsxwriter : 1.2.2 lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : 2.10.3 IPython : 7.9.0 pandas_datareader: None bs4 : 4.8.1 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 2.2.3 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.0 pandas_gbq : None pyarrow : 0.15.1 pytables : None pytest : 5.2.2 pyxlsb : None s3fs : None scipy : 1.3.1 sqlalchemy : 1.3.10 tables : 3.5.2 tabulate : 0.8.5 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.2 numba : 0.46.0
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (6 by maintainers)
Top GitHub Comments
When pickling/unpickling, I can reproduce this:
So apparently, when unpickling, it doesn’t return the same singleton.
A simple example of the problem:
This can also cause exceptions when working with dtypes other than object.
The following function replaces the incorrect NA values in place. It operates one column at-a-time to preserve dtypes. flake8 complains about comparing types rather than using
isinstance
, but I find this easier to read.