Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: index.name not preserved in concat in case of unequal object index

See original GitHub issue

xref #13742 for addl cases.

In [23]: df1 = pd.DataFrame({'a':[1,2]}, index=pd.Index(['a', 'b'], name='idx'))

In [24]: df2 = pd.DataFrame({'b':[2,3]}, index=pd.Index(['b', 'c'], name='idx'))

In [26]: pd.concat([df1, df2], axis=1)
Out[26]:
     a    b
a  1.0  NaN
b  2.0  2.0
c  NaN  3.0

In [27]: print pd.concat([df1, df2], axis=1).index.name
None

So the issue seems to be with a string index that is not equal, as when the index of the two frames is equal (no NaNs are introduced), the name is kept and also when using numerical indexes, see https://github.com/pydata/pandas/issues/13475#issuecomment-232310977

When I use the concat function with input dataframes that have index.name assigned, sometimes the resulting dataframe has the index.name assigned, sometimes it does not.

I ran the code below from the python interpreter, using a conda environment with pandas-0.18.1

I don’t see any odd / extra characters around the “pert_well” column in the files between the files.

Code Sample, a copy-pastable example if possible

import pandas

a_data = """x_amount_mg x_annotation    x_mmoles_per_liter  mfc_plate_name  x_avg_mol_weight    x_volume_ul pert_mfc_desc   pert_iname  x_purity    pert_id_vendor  pert_well   pert_vehicle    pert_mfc_id x_smiles    x_mg_per_ml pert_dose_unit  pert_dose   pert_id pert_plate  pert_type
0.04784 ACCEPT  10.0    B-REPO-01-B64-101   405.4084    11  Taltirelin  Taltirelin  86.52   HY-B0596    C18 DMSO    BRD-K93869735-001-01-1  CN1C(=O)C[C@H](NC1=O)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N1CCC[C@H]1C(N)=O    4.054084    um  20.0    BRD-K93869735   PMEL008 trt_cp"""

b_data = """pert_well   pert_2_type pert_2_id   pert_2_mfc_id   pert_2_mfc_desc pert_2_id_vendor    pert_2_iname    pert_2_dose pert_2_dose_unit    pert_2_vehicle  pert_3_type pert_3_idpert_3_mfc_id  pert_3_mfc_desc pert_3_id_vendor    pert_3_iname    pert_3_dose pert_3_dose_unit    pert_3_vehicle
A01 ctl_vehicle DMSO    DMSO    DMSO    -666    DMSO    -666    -666    -666    ctl_untrt   CMAP-000    -666    UnTrt   -666    -666    -666    -666    -666"""

d_data = """x_amount_mg x_annotation    x_mmoles_per_liter  mfc_plate_name  x_avg_mol_weight    x_volume_ul pert_mfc_desc   pert_iname  x_purity    pert_id_vendor  pert_well   pert_vehicle    pert_mfc_id x_smiles    x_mg_per_ml pert_dose_unit  pert_dose   pert_id pert_plate  pert_type
0.0 -666    -666    B-REPO-01-B64-107   -666    0   -666    -666    -666    -666    A01 -666    -666    -666    -666    -666    -666    CMAP-000    PMEL001 ctl_untrt"""

a = pandas.read_csv(StringIO(a_data), sep="\t", index_col="pert_well")
b = pandas.read_csv(StringIO(b_data), sep="\t", index_col="pert_well")
c = pandas.concat([a,b], axis=1)
c.index

d = pandas.read_csv(StringIO(d_data), sep="\t", index_col="pert_well")
e = pandas.concat([d,b], axis=1)
e.index

results:

Index([u'A01', u'A02', u'A03', u'A04', u'A05', u'A06', u'A07', u'A08', u'A09',
       u'A10',
       ...
       u'P15', u'P16', u'P17', u'P18', u'P19', u'P20', u'P21', u'P22', u'P23',
       u'P24'],
      dtype='object', length=384)

Index([u'A01', u'A02', u'A03', u'A04', u'A05', u'A06', u'A07', u'A08', u'A09',
       u'A10',
       ...
       u'P15', u'P16', u'P17', u'P18', u'P19', u'P20', u'P21', u'P22', u'P23',
       u'P24'],
      dtype='object', name=u'pert_well', length=384)

Expected Output

c.index.name should be “pert_well”

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 2.7.11.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-573.7.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C

pandas: 0.18.1 nose: None pip: 8.1.2 setuptools: 23.0.0 Cython: None numpy: 1.11.0 scipy: None statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None

PMEL_input_files_for_pandas_issue.zip

Issue Analytics

State:
Created 7 years ago
Comments:20 (10 by maintainers)

Top GitHub Comments

1reaction

0antoncommented, Jun 15, 2019

experiencing the same bug on pandas 0.24.2

0reactions

iamlemeccommented, Apr 21, 2020

I believe I have a 2 line fix to union_indexes that takes care of this. Should I submit a PR or just paste the diff here?

Top Results From Across the Web

What's new in 1.4.0 (January 22, 2022) - Pandas

In [1]: idx Out[1]: Index([1, 2, <NA>], dtype='object') ... concat() will preserve the attrs when it is the same for all objects and...

Concat DataFrame Reindexing only valid with uniquely ...

In my case the problem was because I had duplicated column names. ... This is a nice hint! In my case, I use...

Built-in Types — Python 3.11.1 documentation

The following sections describe the standard types that are built into the interpreter. The principal built-in types are numerics, sequences, mappings, ...

Array methods - The Modern JavaScript Tutorial

It returns a new array copying to it all items from index start to end (not including end ). Both start and end...

Bug descriptions — spotbugs 4.7.3 documentation

IllegalMonitorStateException is generally only thrown in case of a design flaw in your code (calling wait or notify on an object you do...