Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Index gets lost when DataFrame melt method is used

See original GitHub issue

Index gets lost when DataFrame melt method is used

import pandas as pd
import numpy as np
df = pd.DataFrame({"Numbers_1":range(0,3),
                   "Numbers_2":range(3,6),
                   "Letters":["A","B","C"]})
df.set_index("Letters",inplace=True)
print(df)

Letters	Numbers_1	Numbers_2
A	0	3
B	1	4
C	2	5


df_melted = df.melt()
print(df_melted)

.	variable	value
0	Numbers_1	0
1	Numbers_1	1
2	Numbers_1	2
3	Numbers_2	3
4	Numbers_2	4
5	Numbers_2	5

Problem description

When melting a dataframe, I expected the original index to be reused. However, the original index is lost in the melt method. This is probably meant by wesm’s comment (# TODO: what about the existing index?) https://github.com/pandas-dev/pandas/blob/133a2087d038da035a57ab90aad557a328b3d60b/pandas/core/reshape/reshape.py#L715

Expected Output

I would expect something like

n_row,n_col = df.shape
index_melted = list(df.index.get_values())*n_col
melt_id = list(np.arange(n_col).repeat(n_row))
temp = list(zip(*[index_melted,melt_id]))

index_melted_uniq = pd.MultiIndex.from_tuples(temp,names=[df.index.names[0], 'melt_id'])
index_numbers = list(range(df.shape[1]))*n_row

data = {'variable':df.columns.repeat(n_row),
        "value":df.values.ravel('F')}

df_expected = pd.DataFrame(data,columns = ["variable","value"], index=index_melted_uniq)
print(df_expected)

Letters	melt_id	variable	value
A	0	Numbers_1	0
B	0	Numbers_1	1
C	0	Numbers_1	2
A	1	Numbers_2	3
B	1	Numbers_2	4
C	1	Numbers_2	5

Where Letters and melt_id are two multiindex levels and variable and value are actual columns.

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line] INSTALLED VERSIONS

commit: d0f62c2816ada96a991f5a624a52c9a4f09617f7 python: 3.6.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.21.0.dev+420.gd0f62c2 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.2.2.post20170724 Cython: 0.26 numpy: 1.13.1 scipy: None pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.3 patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 6 years ago
Reactions:14
Comments:5 (2 by maintainers)

Top GitHub Comments

12reactions

TomAugspurgercommented, Sep 5, 2017

Thanks @NiklasKeck. I thought about proposing this a while back, but never wrote up an issue. melt is already quite complex as is, but this seems worthwhile to avoid an awkward .reset_index() / .set_index() dance.

I never worked out the correct way to handle the interaction between the existing index and the id_vars. Your melt_id is an option but I’ll need to think about it more. In an ideal world, I think that df.index + id_vars would always be unique, and we’d use that as the MI:

In [34]: df.reset_index().melt(id_vars=['Letters']).set_index(['Letters', 'variable'])
Out[34]:
                   value
Letters variable
A       Numbers_1      0
B       Numbers_1      1
C       Numbers_1      2
A       Numbers_2      3
B       Numbers_2      4
C       Numbers_2      5

but that may not be true in general.

Anyway, I think this would be a useful addition (as an option keyword, to preserve backwards compatibility)

11reactions

nickdelgrossocommented, Dec 12, 2018

I’m very interested in this!