question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Index gets lost when DataFrame melt method is used

See original GitHub issue

Index gets lost when DataFrame melt method is used

import pandas as pd
import numpy as np
df = pd.DataFrame({"Numbers_1":range(0,3),
                   "Numbers_2":range(3,6),
                   "Letters":["A","B","C"]})
df.set_index("Letters",inplace=True)
print(df)
Letters Numbers_1 Numbers_2
A 0 3
B 1 4
C 2 5

df_melted = df.melt()
print(df_melted)
. variable value
0 Numbers_1 0
1 Numbers_1 1
2 Numbers_1 2
3 Numbers_2 3
4 Numbers_2 4
5 Numbers_2 5

Problem description

When melting a dataframe, I expected the original index to be reused. However, the original index is lost in the melt method. This is probably meant by wesm’s comment (# TODO: what about the existing index?) https://github.com/pandas-dev/pandas/blob/133a2087d038da035a57ab90aad557a328b3d60b/pandas/core/reshape/reshape.py#L715

Expected Output

I would expect something like

n_row,n_col = df.shape
index_melted = list(df.index.get_values())*n_col
melt_id = list(np.arange(n_col).repeat(n_row))
temp = list(zip(*[index_melted,melt_id]))

index_melted_uniq = pd.MultiIndex.from_tuples(temp,names=[df.index.names[0], 'melt_id'])
index_numbers = list(range(df.shape[1]))*n_row

data = {'variable':df.columns.repeat(n_row),
        "value":df.values.ravel('F')}

df_expected = pd.DataFrame(data,columns = ["variable","value"], index=index_melted_uniq)
print(df_expected)
Letters melt_id variable value
A 0 Numbers_1 0
B 0 Numbers_1 1
C 0 Numbers_1 2
A 1 Numbers_2 3
B 1 Numbers_2 4
C 1 Numbers_2 5

Where Letters and melt_id are two multiindex levels and variable and value are actual columns.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

commit: d0f62c2816ada96a991f5a624a52c9a4f09617f7 python: 3.6.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.21.0.dev+420.gd0f62c2 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.2.2.post20170724 Cython: 0.26 numpy: 1.13.1 scipy: None pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.3 patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:14
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

12reactions
TomAugspurgercommented, Sep 5, 2017

Thanks @NiklasKeck. I thought about proposing this a while back, but never wrote up an issue. melt is already quite complex as is, but this seems worthwhile to avoid an awkward .reset_index() / .set_index() dance.

I never worked out the correct way to handle the interaction between the existing index and the id_vars. Your melt_id is an option but I’ll need to think about it more. In an ideal world, I think that df.index + id_vars would always be unique, and we’d use that as the MI:

In [34]: df.reset_index().melt(id_vars=['Letters']).set_index(['Letters', 'variable'])
Out[34]:
                   value
Letters variable
A       Numbers_1      0
B       Numbers_1      1
C       Numbers_1      2
A       Numbers_2      3
B       Numbers_2      4
C       Numbers_2      5

but that may not be true in general.

Anyway, I think this would be a useful addition (as an option keyword, to preserve backwards compatibility)

11reactions
nickdelgrossocommented, Dec 12, 2018

I’m very interested in this!

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas, melt, unmelt preserve index - Stack Overflow
But now, when I try to unmelt it, I get a 6-row array with lots of blanks! df2.pivot(columns = 'variable', values = 'value')...
Read more >
Pandas Melt – pd.melt() - Data Independent
Pandas Melt is a function you'll use when deciding the architecture of your of your data sets. This will ultimately lead to how...
Read more >
pandas.melt — pandas 1.5.2 documentation
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables ( id_vars ), while...
Read more >
Python | Pandas.melt() - GeeksforGeeks
melt () function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other...
Read more >
Pandas Melt, Stack and wide_to_long For Reshaping ...
The Pandas .melt() is usually the to-go-to function for transforming a wide dataframe into a long one because it's flexible and straightforward.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found