Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Assigned conversion via loc to Int64 fails under peculiar conditions

See original GitHub issue

Code Sample, a copy-pastable example if possible

test = pd.DataFrame({'a': ['test'], 'b': [np.nan]})
test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
# *** AttributeError: 'IntegerArray' object has no attribute 'size'

Problem description

Assignment of .astype('Int64') sometimes fails with AttributeError: 'IntegerArray' object has no attribute 'size'.

Initially, I thought this was a regression from #25584, but it only manifests under very specific conditions:

There is exactly one null value in the converted column

# works
test = pd.DataFrame({'a': ['test', 'test'], 'b': [np.nan, 1.0]})
test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')

# works
test = pd.DataFrame({'a': ['test', 'test'], 'b': [np.nan, np.nan]})
test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')

# fails
test = pd.DataFrame({'a': ['test'], 'b': [np.nan]})
test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
# *** AttributeError: 'IntegerArray' object has no attribute 'size'

There is another column in the DataFrame

# works
test = pd.DataFrame({'b': [np.nan]})
test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')

Assignment happens via df.loc[:, colname] - without assignment, or with assignment to df[colname], things work.

test = pd.DataFrame({'a': ['test'], 'b': [np.nan]})

# fails
test.loc[:, 'b'] = test['b'].astype('Int64')
# *** AttributeError: 'IntegerArray' object has no attribute 'size'

# fails
test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
# *** AttributeError: 'IntegerArray' object has no attribute 'size'

# works
test.loc[:, 'b'].astype('Int64')

# works
test['b'] = test['b'].astype('Int64')

Expected Output

test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64') shouldn’t raise an AttributeError.

print(test) after assignment should read along the lines of

0    <NA>
dtype: Int64

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.6.9.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-46-generic machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.1 numpy : 1.16.1 pytz : 2018.9 dateutil : 2.8.0 pip : 19.3.1 setuptools : 41.4.0 Cython : None pytest : 3.10.1 hypothesis : None sphinx : 1.8.5 blosc : None feather : None xlsxwriter : None lxml.etree : 4.3.0 html5lib : None pymysql : 0.9.3 psycopg2 : None jinja2 : 2.10 IPython : 7.2.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.3.0 matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 3.10.1 pyxlsb : None s3fs : None scipy : 1.2.0 sqlalchemy : 1.2.17 tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

State:
Created 4 years ago
Comments:7 (6 by maintainers)

Top GitHub Comments

1reaction

TAJDcommented, Oct 3, 2020

Please contribute! I’m no longer working on this.

0reactions

hardikpnspcommented, Oct 3, 2020

So far I have added a test for functionality similar to the one provided in the issue. Basically, test astype via loc on np.nan on Int64, Int32 and Int16. should I add tests for pd.NA too? Was adding a test for Int32 and Int16 unnecessary? Also, do we need similar tests for Series? As I am quite new to this, I would appreciate some guidance.

Top Results From Across the Web

How to stop Pandas DataFrame from converting int to float for ...

The doc shows use of .loc[] to add rows by assignment in a column sensitive way. But does so where the DataFrame is...

pandas: Get/Set element values with at, iat, loc, iloc - nkmk note

Use at, iat, loc, or iloc to access data at any location in pandas. ... Implicit type conversion when selecting a row as...

20 Pandas Exercises for Beginners (Python Solutions)

Check out 20 Python pandas exercises with solutions for Python developers to quickly learn and practice pandas skills.

Time series / date functionality — pandas 1.5.2 documentation

Resampling or converting a time series to a particular frequency ... Under the hood, pandas represents timestamps using instances of Timestamp and sequences ......

Selecting Subsets of Data in Pandas: Part 2 - Boolean Selection

Part 1 of this series covered subset selection with [] , .loc and .iloc . All three of these indexers use either the...