question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

astype(str) / astype_unicode: np.nan converted to "nan" (checknull, skipna)

See original GitHub issue

Code Sample

>>> import pandas as pd
>>> import numpy as np
>>> pd.Series(["foo",np.nan]).astype(str)

Output

0 foo 1 nan # nan string dtype: object

Expected output

0 foo 1 NaN # np.nan dtype: object

Problem description

Upon converting this Series I would expect np.nan to remain np.nan but instead it is casted to string “nan”. Maybe I’m alone in this and you’d actually expect that (I can’t see a realistic use case for these string “nan” but well…). So I could figure out than upon using the code sample the Series’ values are processed through astype_unicode in pandas._libs.lib.

There is a skipna argument in astype_unicode and I thought it would get passed along when using pd.Series.astype(str,skipna = True) but it does not. The docstring of pd.Series.astype does not mention skipna explicitely but mentions kwargs so I tried doing this while printing skipna in astype_unicode:

pd.Series(["foo",np.nan]).astype(str, skipna = True)

Skipna stayed to the default value False in astype_unicode so it does not get passed along.

However when using astype_unicode directly setting skipna to True will not change the output of the the code sample anyways because checknull does not seem to work properly. You can test that by printing the result of checknull in lib.pyx as I did here: https://github.com/ThibTrip/pandas/commit/4a5c8397304e3026456d864fd5aeb7b8b9adca5f

Input

>>> from pandas._lib.lib import astype_unicode
>>> import numpy as np
>>> astype_unicode(np.array(["foo",np.nan]))

Output

foo is null? False
nan is null? False
array(['foo', 'nan'], dtype=object)

Expected output

foo is null? False
nan is null? True
array(['foo', NaN], dtype=object)

I tried patching it (so not a proper fix) using the code below but it din’t work. Also would be nice to be able to do pd.Series.astype(str,skipna = True). Whether skipna should then be True or False as default is another matter.

>>> if not (skipna and checknull(arr_i)):
>>>     if arr_i is not np.nan:
>>>         arr_i = unicode(arr_i)

All of this was done in a developper version I installed today (see details below). The only alteration is the code in my commit linked above. Sorry if this has been referenced before I searched in various ways and could not find anything except a similar issue with pd.read_excel and dtype str:

https://github.com/nikoskaragiannakis/pandas/commit/694849da2654d832d5717adabf3fe4e1d5489d43

Also very sorry for the mess with the commits I got a bit confused during my investigation (also I did not get enough sleep). Is it possible to delete all but my last commit? The other ones are irrelevant.

Cheers Thibault

INSTALLED VERSIONS

commit: 4a5c8397304e3026456d864fd5aeb7b8b9adca5f python: 3.7.2.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.25.0.dev0+132.g4a5c83973.dirty pytest: 4.2.0 pip: 19.0.1 setuptools: 40.7.3 Cython: 0.29.5 numpy: 1.15.4 scipy: 1.2.0 pyarrow: 0.11.1 xarray: 0.11.0 IPython: 7.2.0 sphinx: 1.8.4 patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.9 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.9 feather: None matplotlib: 3.0.2 openpyxl: 2.6.0 xlrd: 1.2.0 xlwt: 1.3.0 xlsxwriter: 1.1.2 lxml.etree: 4.3.1 bs4: 4.7.1 html5lib: 1.0.1 sqlalchemy: 1.2.17 pymysql: None psycopg2: None jinja2: 2.10 s3fs: 0.2.0 fastparquet: 0.2.1 pandas_gbq: None pandas_datareader: None gcsfs: None

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:4
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

11reactions
hdaly0commented, May 27, 2022

It seems if you do:

pd.Series(["foo",np.nan]).astype("string")

instead of

pd.Series(["foo",np.nan]).astype(str)

pandas uses a String datatype with nulls and this fixed the issue for me.

Specifically:

s = pd.Series(["foo",np.nan]).astype("string")
type(s[1])
>> <class 'pandas._libs.missing.NAType'>

s = pd.Series(["foo",np.nan]).astype("str")
type(s[1])
>> <class 'str'>

Running versions: pandas 1.3.5 numpy 1.21.6

7reactions
makbigccommented, Aug 26, 2019

I run the code recently. Passing skipna=True as kwargs works. Should we add the skipna=True in the intermediate function call?

In [9]: pd.__version__
Out[9]: '0.25.0+179.gc3d5f227f'

In [10]: ser = pd.Series(['foo', np.nan])

In [11]: ser
Out[11]: 
0    foo
1    NaN
dtype: object

In [12]: ser.astype(str, skipna=True)
Out[12]: 
0    foo
1    NaN
dtype: object

In [14]: np.isnan(ser.astype(str, skipna=True)[1])
Out[14]: True
Read more comments on GitHub >

github_iconTop Results From Across the Web

Convert column to string, retaining NaN (as None or blank)
Use df.astype(str, skipna=True) , it will skip all NA types. Example: import pandas as pd df=pd.Series([12.19, 13.99, 1.00, None, ...
Read more >
pandas.DataFrame.fillna — pandas 1.5.2 documentation
Fill NaN values using interpolation. reindex. Conform object to new index. asfreq. Convert TimeSeries to specified frequency.
Read more >
pandas astype unicode
Use a numpy.dtype or Python type to cast entire pandas object to the same type. pd.Series ( ["foo",np.nan]).astype (str, skipna = True) Skipna...
Read more >
Convert column to string, retaining NaN (as None or blank)
Coding example for the question Convert column to string, retaining NaN (as ... astype(str) / astype_unicode: np.nan converted to "nan" (checknull, skipna).
Read more >
5 Methods to Check for NaN values in in Python
How to check if a single value is NaN in python. There are approaches are using libraries (pandas, math and numpy) and without...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found