BUG: index.name not preserved in concat in case of unequal object index
See original GitHub issuexref #13742 for addl cases.
In [23]: df1 = pd.DataFrame({'a':[1,2]}, index=pd.Index(['a', 'b'], name='idx'))
In [24]: df2 = pd.DataFrame({'b':[2,3]}, index=pd.Index(['b', 'c'], name='idx'))
In [26]: pd.concat([df1, df2], axis=1)
Out[26]:
a b
a 1.0 NaN
b 2.0 2.0
c NaN 3.0
In [27]: print pd.concat([df1, df2], axis=1).index.name
None
So the issue seems to be with a string index that is not equal, as when the index of the two frames is equal (no NaNs are introduced), the name is kept and also when using numerical indexes, see https://github.com/pydata/pandas/issues/13475#issuecomment-232310977
When I use the concat function with input dataframes that have index.name assigned, sometimes the resulting dataframe has the index.name assigned, sometimes it does not.
I ran the code below from the python interpreter, using a conda environment with pandas-0.18.1
I don’t see any odd / extra characters around the “pert_well” column in the files between the files.
Code Sample, a copy-pastable example if possible
import pandas
a_data = """x_amount_mg x_annotation x_mmoles_per_liter mfc_plate_name x_avg_mol_weight x_volume_ul pert_mfc_desc pert_iname x_purity pert_id_vendor pert_well pert_vehicle pert_mfc_id x_smiles x_mg_per_ml pert_dose_unit pert_dose pert_id pert_plate pert_type
0.04784 ACCEPT 10.0 B-REPO-01-B64-101 405.4084 11 Taltirelin Taltirelin 86.52 HY-B0596 C18 DMSO BRD-K93869735-001-01-1 CN1C(=O)C[C@H](NC1=O)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N1CCC[C@H]1C(N)=O 4.054084 um 20.0 BRD-K93869735 PMEL008 trt_cp"""
b_data = """pert_well pert_2_type pert_2_id pert_2_mfc_id pert_2_mfc_desc pert_2_id_vendor pert_2_iname pert_2_dose pert_2_dose_unit pert_2_vehicle pert_3_type pert_3_idpert_3_mfc_id pert_3_mfc_desc pert_3_id_vendor pert_3_iname pert_3_dose pert_3_dose_unit pert_3_vehicle
A01 ctl_vehicle DMSO DMSO DMSO -666 DMSO -666 -666 -666 ctl_untrt CMAP-000 -666 UnTrt -666 -666 -666 -666 -666"""
d_data = """x_amount_mg x_annotation x_mmoles_per_liter mfc_plate_name x_avg_mol_weight x_volume_ul pert_mfc_desc pert_iname x_purity pert_id_vendor pert_well pert_vehicle pert_mfc_id x_smiles x_mg_per_ml pert_dose_unit pert_dose pert_id pert_plate pert_type
0.0 -666 -666 B-REPO-01-B64-107 -666 0 -666 -666 -666 -666 A01 -666 -666 -666 -666 -666 -666 CMAP-000 PMEL001 ctl_untrt"""
a = pandas.read_csv(StringIO(a_data), sep="\t", index_col="pert_well")
b = pandas.read_csv(StringIO(b_data), sep="\t", index_col="pert_well")
c = pandas.concat([a,b], axis=1)
c.index
d = pandas.read_csv(StringIO(d_data), sep="\t", index_col="pert_well")
e = pandas.concat([d,b], axis=1)
e.index
results:
Index([u'A01', u'A02', u'A03', u'A04', u'A05', u'A06', u'A07', u'A08', u'A09',
u'A10',
...
u'P15', u'P16', u'P17', u'P18', u'P19', u'P20', u'P21', u'P22', u'P23',
u'P24'],
dtype='object', length=384)
Index([u'A01', u'A02', u'A03', u'A04', u'A05', u'A06', u'A07', u'A08', u'A09',
u'A10',
...
u'P15', u'P16', u'P17', u'P18', u'P19', u'P20', u'P21', u'P22', u'P23',
u'P24'],
dtype='object', name=u'pert_well', length=384)
Expected Output
c.index.name should be “pert_well”
output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.11.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-573.7.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C
pandas: 0.18.1 nose: None pip: 8.1.2 setuptools: 23.0.0 Cython: None numpy: 1.11.0 scipy: None statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None
Issue Analytics
- State:
- Created 7 years ago
- Comments:20 (10 by maintainers)
experiencing the same bug on pandas 0.24.2
I believe I have a 2 line fix to
union_indexes
that takes care of this. Should I submit a PR or just paste the diff here?