Plotting Int64 columns with nulled integers (NAType) fails
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [7, 5, np.nan, 3, 2]})
df.plot(x='A', y='B')
df = df.astype('Int64')
df.plot(x='A', y='B')
Problem description
The first plotting command works, the second throws the error message
TypeError: float() argument must be a string or a number, not ‘NAType’
Expected Output
NAType should be treated the same way as numpy nan in plotting. Maybe transformed on the fly?
(I’m unsure if this is a pandas, a numpy, or a matplotlib issue, I’m starting here)
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.7.6.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : en LOCALE : None.None
pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.2.0.post20200209 Cython : 0.29.15 pytest : None hypothesis : None sphinx : 2.4.1 blosc : None feather : None xlsxwriter : 1.2.7 lxml.etree : 4.5.0 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.12.0 pandas_datareader: None bs4 : 4.8.2 bottleneck : None fastparquet : 0.3.3 gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.13 tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : 1.2.7 numba : 0.48.0
Issue Analytics
- State:
- Created 4 years ago
- Comments:19 (16 by maintainers)
Top GitHub Comments
Awesome! Let us know if you want/need help
Hi, I have a comment on this issue. It happened to me in v1.2.4, but it seems like it has been fixed now (v1.4.3) it has been fixed, but I have not found where or when it happened. I believe it’s connected and could save time for someone who encounters this error while plotting with matplotlib:
If one runs the following code:
ie. if one converts dtypes to pandas dtypes, suddenly plotting with matplotlib fails. The code above will plot the first plot but after one converts the types, it fails. Notice that it doesn’t matter if the variable is
Float64
orInt64
, ie. just plottingplt.plot(df["x"],df["x"])
or plt.plot(df[“y”],df[“y”]) will yield the same.The full output is below:
~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in get_slice(self, slobj, axis) 1606 blk = self._block -> 1607 array = blk._slice(slobj) 1608 block = blk.make_block_same_class(array, placement=slice(0, len(array)))
~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/blocks.py in _slice(self, slicer) 1923 -> 1924 return self.values[slicer] 1925
~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in getitem(self, item) 114 –> 115 return type(self)(self._data[item], self._mask[item]) 116
~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/integer.py in init(self, values, mask, copy) 347 ) –> 348 super().init(values, mask, copy=copy) 349
~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in init(self, values, mask, copy) 89 if values.ndim != 1: —> 90 raise ValueError(“values must be a 1D array”) 91 if mask.ndim != 1:
ValueError: values must be a 1D array
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last) /var/folders/bx/tb4883l53hdd3zp2y0nyy_4m0000gp/T/ipykernel_92470/2350386431.py in <module> 16 print(df.dtypes) 17 # plot —> 18 plt.plot(df[“x”],df[“y”])
~/anaconda3/lib/python3.8/site-packages/matplotlib/pyplot.py in plot(scalex, scaley, data, *args, **kwargs) 3017 @_copy_docstring_and_deprecators(Axes.plot) 3018 def plot(*args, scalex=True, scaley=True, data=None, **kwargs): -> 3019 return gca().plot( 3020 *args, scalex=scalex, scaley=scaley, 3021 **({“data”: data} if data is not None else {}), **kwargs)
~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_axes.py in plot(self, scalex, scaley, data, *args, **kwargs) 1603 “”" 1604 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D) -> 1605 lines = [*self._get_lines(*args, data=data, **kwargs)] 1606 for line in lines: 1607 self.add_line(line)
~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py in call(self, data, *args, **kwargs) 313 this += args[0], 314 args = args[1:] –> 315 yield from self._plot_args(this, kwargs) 316 317 def get_next_color(self):
~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs, return_kwargs) 488 489 if len(xy) == 2: –> 490 x = _check_1d(xy[0]) 491 y = _check_1d(xy[1]) 492 else:
~/anaconda3/lib/python3.8/site-packages/matplotlib/cbook/init.py in _check_1d(x) 1360 message=‘Support for multi-dimensional indexing’) 1361 -> 1362 ndim = x[:, None].ndim 1363 # we have definitely hit a pandas index or series object 1364 # cast to a numpy array.
~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in getitem(self, key) 875 return self._get_values(key) 876 –> 877 return self._get_with(key) 878 879 def _get_with(self, key):
~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_with(self, key) 890 ) 891 elif isinstance(key, tuple): –> 892 return self._get_values_tuple(key) 893 894 elif not is_list_like(key):
~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values_tuple(self, key) 920 # mpl hackaround 921 if com.any_none(*key): –> 922 result = self._get_values(key) 923 deprecate_ndim_indexing(result, stacklevel=5) 924 return result
~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values(self, indexer) 940 # see tests.series.timeseries.test_mpl_compat_hack 941 # the asarray is needed to avoid returning a 2D DatetimeArray –> 942 return np.asarray(self._values[indexer]) 943 944 def _get_value(self, label, takeable: bool = False):
~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in getitem(self, item) 113 item = check_array_indexer(self, item) 114 –> 115 return type(self)(self._data[item], self._mask[item]) 116 117 def _coerce_to_array(self, values) -> Tuple[np.ndarray, np.ndarray]:
~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/integer.py in init(self, values, mask, copy) 346 “the ‘pd.array’ function instead” 347 ) –> 348 super().init(values, mask, copy=copy) 349 350 def neg(self):
~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in init(self, values, mask, copy) 88 ) 89 if values.ndim != 1: —> 90 raise ValueError(“values must be a 1D array”) 91 if mask.ndim != 1: 92 raise ValueError(“mask must be a 1D array”)
ValueError: values must be a 1D array
In case it was relevant, this was my installation at that time:
INSTALLED VERSIONS
commit : 2cb96529396d93b46abab7bbc73a208e708c642e python : 3.8.11.final.0 python-bits : 64 OS : Darwin OS-release : 21.5.0 Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.2.4 numpy : 1.22.3 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.2 setuptools : 52.0.0.post20210125 Cython : 0.29.24 pytest : 6.2.5 hypothesis : None sphinx : 4.0.2 blosc : None feather : None xlsxwriter : 3.0.1 lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.3 IPython : 7.26.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.2 fsspec : 2021.07.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.3 sqlalchemy : 1.4.22 tables : 3.6.1 tabulate : 0.8.7 xarray : 0.19.0 xlrd : 2.0.1 xlwt : 1.3.0 numba : 0.53.1