question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Union of multi index with EA types can lose EA dtype

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> from cyberpandas import IPArray
>>> import pandas as pd
>>> 
>>> df1 = pd.DataFrame({
...     'address': IPArray(['192.168.1.1', '192.168.1.10']),
...     'date': ['2022-01-01', '2022-01-02'],
...     'a': [1, 2]
... })
>>> df1 = df1.set_index(['address', 'date'])
>>> 
>>> 
>>> df2 = pd.DataFrame({
...     'address': IPArray(['192.168.1.1', '192.168.1.10']),
...     'date': pd.to_datetime(['2022-01-01', '2022-01-02']),
...     'a': [1, 2]
... })
>>> df2 = df2.set_index(['address', 'date'])
>>> 
>>> df1.index.dtypes
address        ip
date       object
dtype: object
>>> 
>>> df2.index.dtypes
address                ip
date       datetime64[ns]
dtype: object
>>> 
>>> df1.index.union(df2.index).dtypes
address            object   # <-- should be type "ip", not "object"
date       datetime64[ns]
dtype: object

Issue Description

The ExtensionType can get lost when two MultiIndex objects are combined by .union() (which becomes a problem when using df.combine_first(...) which relies on index.union(...)).

The problem occurs when both MIs share the same EA series, but the other series (assuming only 2-series MI) has a different type. In that case, the former EA dimension of the joined MI is losing its EA dtype.

Expected Behavior

EA type can be maintained after index.union(...).

Installed Versions

INSTALLED VERSIONS

commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.8.13.final.0 python-bits : 64 OS : Darwin OS-release : 21.5.0 Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_AU.UTF-8 LOCALE : en_AU.UTF-8

pandas : 1.4.3 numpy : 1.23.1 pytz : 2022.1 dateutil : 2.8.2 setuptools : 62.3.2 pip : 22.1.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None markupsafe : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
phoflcommented, Aug 11, 2022

Hi, thanks for your report. As a note: This also happens for our own extension arrays.

0reactions
phoflcommented, Oct 14, 2022

We did a couple of prs in that are. You can check the whatsnew for 2.0, they are documented there

Read more comments on GitHub >

github_iconTop Results From Across the Web

What's new in 1.1.0 (July 28, 2020)
Index.union() will now raise RuntimeWarning for MultiIndex objects if the object ... Fixed bug in Series construction with EA dtype and index but...
Read more >
pyspark.pandas.MultiIndex
Return the label from the index, or, if not present, the previous one. astype (dtype). Cast a pandas-on-Spark object to a specified dtype...
Read more >
6 Data Loading, Storage, and File Formats
I'm going to focus on data input and output using pandas, though there are ... 25 BUG: Union of multi index with EA...
Read more >
Developer Guide :: NVIDIA Deep Learning TensorRT ...
This NVIDIA TensorRT Developer Guide demonstrates how to use the C++ and Python APIs for implementing the most common deep learning layers.
Read more >
pandas.core.dtypes.common.is_integer_dtype Example
Learn how to use python api pandas.core.dtypes.common.is_integer_dtype. ... codes[-1][sorter] if bins is None: mi = MultiIndex( levels=levels, codes=codes, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found