Plotting Int64 columns with nulled integers (NAType) fails

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [7, 5, np.nan, 3, 2]})
df.plot(x='A', y='B')
df = df.astype('Int64')
df.plot(x='A', y='B')

Problem description

The first plotting command works, the second throws the error message

TypeError: float() argument must be a string or a number, not ‘NAType’

Expected Output

NAType should be treated the same way as numpy nan in plotting. Maybe transformed on the fly?

(I’m unsure if this is a pandas, a numpy, or a matplotlib issue, I’m starting here)

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.7.6.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : en LOCALE : None.None

pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.2.0.post20200209 Cython : 0.29.15 pytest : None hypothesis : None sphinx : 2.4.1 blosc : None feather : None xlsxwriter : 1.2.7 lxml.etree : 4.5.0 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.12.0 pandas_datareader: None bs4 : 4.8.2 bottleneck : None fastparquet : 0.3.3 gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.13 tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : 1.2.7 numba : 0.48.0

Issue Analytics

State:
Created 4 years ago
Comments:19 (16 by maintainers)

Top GitHub Comments

1reaction

MarcoGorellicommented, Oct 19, 2020

It’s been 2 months since nobody is working on it. I am interested to work on this issue.

Awesome! Let us know if you want/need help

0reactions

jankaWIScommented, Jul 2, 2022

Hi, I have a comment on this issue. It happened to me in v1.2.4, but it seems like it has been fixed now (v1.4.3) it has been fixed, but I have not found where or when it happened. I believe it’s connected and could save time for someone who encounters this error while plotting with matplotlib:

ValueError: values must be a 1D array

If one runs the following code:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'x' : [1,2,3,4,5],
    'y' : [1.,2.,3.,4.1,5]
})

print(df.dtypes)
# x      int64
# y    float64
# dtype: object

# plot
plt.plot(df["x"],df["y"])

# convert types
df = df.convert_dtypes()

print(df.dtypes)
# x      Int64
# y    Float64
# dtype: object

# plot
plt.plot(df["x"],df["y"])

ie. if one converts dtypes to pandas dtypes, suddenly plotting with matplotlib fails. The code above will plot the first plot but after one converts the types, it fails. Notice that it doesn’t matter if the variable is Float64 or Int64, ie. just plotting plt.plot(df["x"],df["x"]) or plt.plot(df[“y”],df[“y”]) will yield the same.

The full output is below:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values(self, indexer) 936 try: --> 937 return self._constructor(self._mgr.get_slice(indexer)).__finalize__(self) 938 except ValueError:

~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in get_slice(self, slobj, axis) 1606 blk = self._block -> 1607 array = blk._slice(slobj) 1608 block = blk.make_block_same_class(array, placement=slice(0, len(array)))

~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/blocks.py in _slice(self, slicer) 1923 -> 1924 return self.values[slicer] 1925

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in getitem(self, item) 114 –> 115 return type(self)(self._data[item], self._mask[item]) 116

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/integer.py in init(self, values, mask, copy) 347 ) –> 348 super().init(values, mask, copy=copy) 349

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in init(self, values, mask, copy) 89 if values.ndim != 1: —> 90 raise ValueError(“values must be a 1D array”) 91 if mask.ndim != 1:

ValueError: values must be a 1D array

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) /var/folders/bx/tb4883l53hdd3zp2y0nyy_4m0000gp/T/ipykernel_92470/2350386431.py in <module> 16 print(df.dtypes) 17 # plot —> 18 plt.plot(df[“x”],df[“y”])

~/anaconda3/lib/python3.8/site-packages/matplotlib/pyplot.py in plot(scalex, scaley, data, *args, **kwargs) 3017 @_copy_docstring_and_deprecators(Axes.plot) 3018 def plot(*args, scalex=True, scaley=True, data=None, **kwargs): -> 3019 return gca().plot( 3020 *args, scalex=scalex, scaley=scaley, 3021 **({“data”: data} if data is not None else {}), **kwargs)

~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_axes.py in plot(self, scalex, scaley, data, *args, **kwargs) 1603 “”" 1604 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D) -> 1605 lines = [*self._get_lines(*args, data=data, **kwargs)] 1606 for line in lines: 1607 self.add_line(line)

~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py in call(self, data, *args, **kwargs) 313 this += args[0], 314 args = args[1:] –> 315 yield from self._plot_args(this, kwargs) 316 317 def get_next_color(self):

~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs, return_kwargs) 488 489 if len(xy) == 2: –> 490 x = _check_1d(xy[0]) 491 y = _check_1d(xy[1]) 492 else:

~/anaconda3/lib/python3.8/site-packages/matplotlib/cbook/init.py in _check_1d(x) 1360 message=‘Support for multi-dimensional indexing’) 1361 -> 1362 ndim = x[:, None].ndim 1363 # we have definitely hit a pandas index or series object 1364 # cast to a numpy array.

~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in getitem(self, key) 875 return self._get_values(key) 876 –> 877 return self._get_with(key) 878 879 def _get_with(self, key):

~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_with(self, key) 890 ) 891 elif isinstance(key, tuple): –> 892 return self._get_values_tuple(key) 893 894 elif not is_list_like(key):

~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values_tuple(self, key) 920 # mpl hackaround 921 if com.any_none(*key): –> 922 result = self._get_values(key) 923 deprecate_ndim_indexing(result, stacklevel=5) 924 return result

~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values(self, indexer) 940 # see tests.series.timeseries.test_mpl_compat_hack 941 # the asarray is needed to avoid returning a 2D DatetimeArray –> 942 return np.asarray(self._values[indexer]) 943 944 def _get_value(self, label, takeable: bool = False):

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in getitem(self, item) 113 item = check_array_indexer(self, item) 114 –> 115 return type(self)(self._data[item], self._mask[item]) 116 117 def _coerce_to_array(self, values) -> Tuple[np.ndarray, np.ndarray]:

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/integer.py in init(self, values, mask, copy) 346 “the ‘pd.array’ function instead” 347 ) –> 348 super().init(values, mask, copy=copy) 349 350 def neg(self):

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in init(self, values, mask, copy) 88 ) 89 if values.ndim != 1: —> 90 raise ValueError(“values must be a 1D array”) 91 if mask.ndim != 1: 92 raise ValueError(“mask must be a 1D array”)

ValueError: values must be a 1D array

In case it was relevant, this was my installation at that time:

INSTALLED VERSIONS

commit : 2cb96529396d93b46abab7bbc73a208e708c642e python : 3.8.11.final.0 python-bits : 64 OS : Darwin OS-release : 21.5.0 Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.2.4 numpy : 1.22.3 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.2 setuptools : 52.0.0.post20210125 Cython : 0.29.24 pytest : 6.2.5 hypothesis : None sphinx : 4.0.2 blosc : None feather : None xlsxwriter : 3.0.1 lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.3 IPython : 7.26.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.2 fsspec : 2021.07.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.3 sqlalchemy : 1.4.22 tables : 3.6.1 tabulate : 0.8.7 xarray : 0.19.0 xlrd : 2.0.1 xlwt : 1.3.0 numba : 0.53.1