Assigned conversion via loc to Int64 fails under peculiar conditions
See original GitHub issueCode Sample, a copy-pastable example if possible
test = pd.DataFrame({'a': ['test'], 'b': [np.nan]})
test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
# *** AttributeError: 'IntegerArray' object has no attribute 'size'
Problem description
Assignment of .astype('Int64')
sometimes fails with AttributeError: 'IntegerArray' object has no attribute 'size'
.
Initially, I thought this was a regression from #25584, but it only manifests under very specific conditions:
- There is exactly one null value in the converted column
# works test = pd.DataFrame({'a': ['test', 'test'], 'b': [np.nan, 1.0]}) test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64') # works test = pd.DataFrame({'a': ['test', 'test'], 'b': [np.nan, np.nan]}) test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64') # fails test = pd.DataFrame({'a': ['test'], 'b': [np.nan]}) test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64') # *** AttributeError: 'IntegerArray' object has no attribute 'size'
- There is another column in the DataFrame
# works test = pd.DataFrame({'b': [np.nan]}) test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
- Assignment happens via
df.loc[:, colname]
- without assignment, or with assignment todf[colname]
, things work.test = pd.DataFrame({'a': ['test'], 'b': [np.nan]}) # fails test.loc[:, 'b'] = test['b'].astype('Int64') # *** AttributeError: 'IntegerArray' object has no attribute 'size' # fails test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64') # *** AttributeError: 'IntegerArray' object has no attribute 'size' # works test.loc[:, 'b'].astype('Int64') # works test['b'] = test['b'].astype('Int64')
Expected Output
test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
shouldn’t raise an AttributeError
.
print(test)
after assignment should read along the lines of
0 <NA>
dtype: Int64
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.6.9.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-46-generic machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.0.1 numpy : 1.16.1 pytz : 2018.9 dateutil : 2.8.0 pip : 19.3.1 setuptools : 41.4.0 Cython : None pytest : 3.10.1 hypothesis : None sphinx : 1.8.5 blosc : None feather : None xlsxwriter : None lxml.etree : 4.3.0 html5lib : None pymysql : 0.9.3 psycopg2 : None jinja2 : 2.10 IPython : 7.2.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.3.0 matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 3.10.1 pyxlsb : None s3fs : None scipy : 1.2.0 sqlalchemy : 1.2.17 tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (6 by maintainers)
Top GitHub Comments
Please contribute! I’m no longer working on this.
So far I have added a test for functionality similar to the one provided in the issue. Basically, test astype via loc on np.nan on
Int64, Int32 and Int16
. should I add tests forpd.NA
too? Was adding a test forInt32 and Int16
unnecessary? Also, do we need similar tests forSeries
? As I am quite new to this, I would appreciate some guidance.