Index gets lost when DataFrame melt method is used
See original GitHub issueIndex gets lost when DataFrame melt method is used
import pandas as pd
import numpy as np
df = pd.DataFrame({"Numbers_1":range(0,3),
"Numbers_2":range(3,6),
"Letters":["A","B","C"]})
df.set_index("Letters",inplace=True)
print(df)
Letters | Numbers_1 | Numbers_2 |
---|---|---|
A | 0 | 3 |
B | 1 | 4 |
C | 2 | 5 |
df_melted = df.melt()
print(df_melted)
. | variable | value |
---|---|---|
0 | Numbers_1 | 0 |
1 | Numbers_1 | 1 |
2 | Numbers_1 | 2 |
3 | Numbers_2 | 3 |
4 | Numbers_2 | 4 |
5 | Numbers_2 | 5 |
Problem description
When melting a dataframe, I expected the original index to be reused. However, the original index is lost in the melt method. This is probably meant by wesm’s comment (# TODO: what about the existing index?) https://github.com/pandas-dev/pandas/blob/133a2087d038da035a57ab90aad557a328b3d60b/pandas/core/reshape/reshape.py#L715
Expected Output
I would expect something like
n_row,n_col = df.shape
index_melted = list(df.index.get_values())*n_col
melt_id = list(np.arange(n_col).repeat(n_row))
temp = list(zip(*[index_melted,melt_id]))
index_melted_uniq = pd.MultiIndex.from_tuples(temp,names=[df.index.names[0], 'melt_id'])
index_numbers = list(range(df.shape[1]))*n_row
data = {'variable':df.columns.repeat(n_row),
"value":df.values.ravel('F')}
df_expected = pd.DataFrame(data,columns = ["variable","value"], index=index_melted_uniq)
print(df_expected)
Letters | melt_id | variable | value |
---|---|---|---|
A | 0 | Numbers_1 | 0 |
B | 0 | Numbers_1 | 1 |
C | 0 | Numbers_1 | 2 |
A | 1 | Numbers_2 | 3 |
B | 1 | Numbers_2 | 4 |
C | 1 | Numbers_2 | 5 |
Where Letters and melt_id are two multiindex levels and variable and value are actual columns.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: d0f62c2816ada96a991f5a624a52c9a4f09617f7 python: 3.6.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None
pandas: 0.21.0.dev+420.gd0f62c2 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.2.2.post20170724 Cython: 0.26 numpy: 1.13.1 scipy: None pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.3 patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Reactions:14
- Comments:5 (2 by maintainers)
Top GitHub Comments
Thanks @NiklasKeck. I thought about proposing this a while back, but never wrote up an issue.
melt
is already quite complex as is, but this seems worthwhile to avoid an awkward.reset_index() / .set_index()
dance.I never worked out the correct way to handle the interaction between the existing index and the
id_vars
. Yourmelt_id
is an option but I’ll need to think about it more. In an ideal world, I think thatdf.index + id_vars
would always be unique, and we’d use that as the MI:but that may not be true in general.
Anyway, I think this would be a useful addition (as an option keyword, to preserve backwards compatibility)
I’m very interested in this!