question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Assigned conversion via loc to Int64 fails under peculiar conditions

See original GitHub issue

Code Sample, a copy-pastable example if possible

test = pd.DataFrame({'a': ['test'], 'b': [np.nan]})
test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
# *** AttributeError: 'IntegerArray' object has no attribute 'size'

Problem description

Assignment of .astype('Int64') sometimes fails with AttributeError: 'IntegerArray' object has no attribute 'size'.

Initially, I thought this was a regression from #25584, but it only manifests under very specific conditions:

  1. There is exactly one null value in the converted column
    # works
    test = pd.DataFrame({'a': ['test', 'test'], 'b': [np.nan, 1.0]})
    test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
    
    # works
    test = pd.DataFrame({'a': ['test', 'test'], 'b': [np.nan, np.nan]})
    test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
    
    # fails
    test = pd.DataFrame({'a': ['test'], 'b': [np.nan]})
    test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
    # *** AttributeError: 'IntegerArray' object has no attribute 'size'
    
  2. There is another column in the DataFrame
    # works
    test = pd.DataFrame({'b': [np.nan]})
    test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
    
  3. Assignment happens via df.loc[:, colname] - without assignment, or with assignment to df[colname], things work.
    test = pd.DataFrame({'a': ['test'], 'b': [np.nan]})
    
    # fails
    test.loc[:, 'b'] = test['b'].astype('Int64')
    # *** AttributeError: 'IntegerArray' object has no attribute 'size'
    
    # fails
    test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64')
    # *** AttributeError: 'IntegerArray' object has no attribute 'size'
    
    # works
    test.loc[:, 'b'].astype('Int64')
    
    # works
    test['b'] = test['b'].astype('Int64')
    

Expected Output

test.loc[:, 'b'] = test.loc[:, 'b'].astype('Int64') shouldn’t raise an AttributeError.

print(test) after assignment should read along the lines of

0    <NA>
dtype: Int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.6.9.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-46-generic machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.1 numpy : 1.16.1 pytz : 2018.9 dateutil : 2.8.0 pip : 19.3.1 setuptools : 41.4.0 Cython : None pytest : 3.10.1 hypothesis : None sphinx : 1.8.5 blosc : None feather : None xlsxwriter : None lxml.etree : 4.3.0 html5lib : None pymysql : 0.9.3 psycopg2 : None jinja2 : 2.10 IPython : 7.2.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.3.0 matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 3.10.1 pyxlsb : None s3fs : None scipy : 1.2.0 sqlalchemy : 1.2.17 tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
TAJDcommented, Oct 3, 2020

Please contribute! I’m no longer working on this.

0reactions
hardikpnspcommented, Oct 3, 2020

So far I have added a test for functionality similar to the one provided in the issue. Basically, test astype via loc on np.nan on Int64, Int32 and Int16. should I add tests for pd.NA too? Was adding a test for Int32 and Int16 unnecessary? Also, do we need similar tests for Series? As I am quite new to this, I would appreciate some guidance.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to stop Pandas DataFrame from converting int to float for ...
The doc shows use of .loc[] to add rows by assignment in a column sensitive way. But does so where the DataFrame is...
Read more >
pandas: Get/Set element values with at, iat, loc, iloc - nkmk note
Use at, iat, loc, or iloc to access data at any location in pandas. ... Implicit type conversion when selecting a row as...
Read more >
20 Pandas Exercises for Beginners (Python Solutions)
Check out 20 Python pandas exercises with solutions for Python developers to quickly learn and practice pandas skills.
Read more >
Time series / date functionality — pandas 1.5.2 documentation
Resampling or converting a time series to a particular frequency ... Under the hood, pandas represents timestamps using instances of Timestamp and sequences ......
Read more >
Selecting Subsets of Data in Pandas: Part 2 - Boolean Selection
Part 1 of this series covered subset selection with [] , .loc and .iloc . All three of these indexers use either the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found