BUG: datetime ExtensionDtype do not work with DataFrame
See original GitHub issue-
I have checked that this issue has not already been reported. (at least I couldn’t find one)
-
I have confirmed this bug exists on the latest version of pandas. (1.1.0)
-
(optional) I have confirmed this bug exists on the master branch of pandas. (
934e9f840ebd2e8b5a5181b19a23e033bd3985a5
)
Code Sample, a copy-pastable example
This is some high-level example that lead to the investion. It relies on rle-array
(commit dfa79295a580d533ee9d2ea901e8808496dbcdc9
was used), because the pandas-provided DatetimeArray
uses a NumPy dtype or DatetimeTZDtype
. Both cases somewhat work (see “Problem description”).
import pandas as pd
from rle_array import RLEArray
array = RLEArray._from_sequence([], dtype="datetime64[ns]")
df = pd.DataFrame({"x": array})
Traceback (most recent call last):
File "bug.py", line 5, in <module>
pd.DataFrame({"x": array})
File ".../lib/python3.8/site-packages/pandas/core/frame.py", line 467, in __init__
mgr = init_dict(data, index, columns, dtype=dtype)
File ".../lib/python3.8/site-packages/pandas/core/internals/construction.py", line 283, in init_dict
return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File ".../lib/python3.8/site-packages/pandas/core/internals/construction.py", line 93, in arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File ".../lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1650, in create_block_manager_from_arrays
blocks = form_blocks(arrays, names, axes)
File ".../lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1703, in form_blocks
block_type = get_block_type(v)
File ".../lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 2672, in get_block_type
assert not is_datetime64tz_dtype(values.dtype)
AssertionError
Problem description
See here:
datetime (and also interval) types are checked BEFORE extension types which means that extension datetime types never end up in ExtensionBlock
s. The latter one would be useful if:
- the datetime objects is not compatible with NumPy
- the data should not be converted to to NumPy (e.g. due to compression, like in the
rle-array
case)
Furthermore the invariant issubclass(vtype, np.datetime64) => not is_datetime64tz_dtype(values.dtype)
does NOT hold for all extension dtypes, at least not under the current implementation of is_datetime64tz_dtype
:
Expected Output
The code example works and df._data
shows that the data ends up in an ExtensionBlock
.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : d9fff2792bf16178d4e450fe7384244e50635733
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.1.0
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : 3.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.50.1
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Related to https://github.com/pandas-dev/pandas/issues/35762
I had a look at the code in
get_block_type
: Re-ordering it doesn’t work trivially, because of the pandas-provided datetime extension types will otherwise end up in anExtensionBlock
which will break a lot of things. So we have the following “conflict”:DatetimeArray
is implemented in a way that it relies onDatetimeBlock
/DatetimeTZBlock
but at the same time has an extension dtypeExtensionBlock
So I think either
DatetimeArray
needs some changes or some special handling specifically for theDatetimeArray
is added toget_block_type
.