BUG: cannot safely cast non-equivalent float64 to Int64
See original GitHub issueimport pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3,4), columns= list('ABCD'))
df_int = pd.to_numeric(df['A'], errors='coerce').astype('Int64')
Problem description
cannot safely cast non-equivalent float64 to Int64, it should happen like when you convert from float64 to int64, which is rounding down the number
Expected Output
0 0 1 0 2 0 Name: A, dtype: Int64
Output of pd.show_versions()
pandas : 1.0.1
You can pass to Int64 safely by doing:
df_int = np.floor(pd.to_numeric(df['A'], errors='coerce')).astype('Int64')
TypeError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandas\core\arrays\integer.py in safe_cast(values, dtype, copy) 143 try: –> 144 return values.astype(dtype, casting=“safe”, copy=copy) 145 except TypeError:
TypeError: Cannot cast array from dtype(‘float64’) to dtype(‘int64’) according to the rule ‘safe’
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last) <ipython-input-14-8151dc1b5846> in <module> ----> 1 df_int = pd.to_numeric(df[‘A’], errors=‘coerce’).astype(‘Int64’) 2 df_int
~\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors) 5696 else: 5697 # else, only a single dtype is given -> 5698 new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors) 5699 return self._constructor(new_data).finalize(self) 5700
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors) 580 581 def astype(self, dtype, copy: bool = False, errors: str = “raise”): –> 582 return self.apply(“astype”, dtype=dtype, copy=copy, errors=errors) 583 584 def convert(self, **kwargs):
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, filter, **kwargs) 440 applied = b.apply(f, **kwargs) 441 else: –> 442 applied = getattr(b, f)(**kwargs) 443 result_blocks = _extend_blocks(applied, result_blocks) 444
~\anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors) 623 vals1d = values.ravel() 624 try: –> 625 values = astype_nansafe(vals1d, dtype, copy=True) 626 except (ValueError, TypeError): 627 # e.g. astype_nansafe can fail on object-dtype of strings
~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna) 819 # dispatch on extension dtype if needed 820 if is_extension_array_dtype(dtype): –> 821 return dtype.construct_array_type()._from_sequence(arr, dtype=dtype, copy=copy) 822 823 if not isinstance(dtype, np.dtype):
~\anaconda3\lib\site-packages\pandas\core\arrays\integer.py in _from_sequence(cls, scalars, dtype, copy) 348 @classmethod 349 def _from_sequence(cls, scalars, dtype=None, copy=False): –> 350 return integer_array(scalars, dtype=dtype, copy=copy) 351 352 @classmethod
~\anaconda3\lib\site-packages\pandas\core\arrays\integer.py in integer_array(values, dtype, copy) 129 TypeError if incompatible types 130 “”" –> 131 values, mask = coerce_to_array(values, dtype=dtype, copy=copy) 132 return IntegerArray(values, mask) 133
~\anaconda3\lib\site-packages\pandas\core\arrays\integer.py in coerce_to_array(values, dtype, mask, copy) 245 values = safe_cast(values, dtype, copy=False) 246 else: –> 247 values = safe_cast(values, dtype, copy=False) 248 249 return values, mask
~\anaconda3\lib\site-packages\pandas\core\arrays\integer.py in safe_cast(values, dtype, copy) 150 151 raise TypeError( –> 152 f"cannot safely cast non-equivalent {values.dtype} to {np.dtype(dtype)}" 153 ) 154
TypeError: cannot safely cast non-equivalent float64 to int64
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (2 by maintainers)
The problem is there is a NaN equivalent for Int64 as well and in some cases like when no other data has a decimal place, it is inappropriate to assume float.
If this is expected result then fine, it is not a bug. However it does seem like it is at minimum unexpected for the situation I describe above and should be better.
I think a more honest answer from the maintainers is to accept this as an unexpected result that makes working with pandas harder.
There can be legitimate rational for not working on it for the past 2 years. The way this issue was closed feels more like the maintainers would rather sweep it under the rug, since the comment preceding the closing of this issue did not really address any of the criticism of earlier comment.
Those critiques being:
float
toInt64
, raise errors instead. For exampleread_csv
dtype
keyword arguments.Int64
is a nullable integer type and thus should be convertable from float if the floats have no decimal values. This to me is the clearest point that this is in fact a bug not something more suitable for a feature request.Now to be fair to the maintainers, the original issue creator’s problem seems to be that they are trying to cast float values that are not whole numbers to Int64, in that case the error message makes sense. I think it would be reasonable to also suggest this be a new bug issue for clarity too.