No roundtrip DataFrame.to/from_csv() with multiindex columns
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
# create a dataframe with multiindex columns
arrays = [['A','A','B','B'],['a','b','a','b']]
tuples = list(zip(*arrays))
columnIndex = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
A = pd.DataFrame(data=np.random.randn(4,4),columns=columnIndex)
# save it to csv
A.to_csv('test.csv')
print(A.columns)
# try to do a round trip...
B = pd.DataFrame.from_csv('test.csv')
print(B.columns)
Output:
A.colums = MultiIndex(levels=[['A', 'B'], ['a', 'b']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['first', 'second'])
B.columns = Index(['A', 'A.1', 'B', 'B.1'], dtype='object')
Problem description
I would expect .to_csv()
and .from_csv()
to result in a round trip, however the column multiindex is not being read correctly by .from_csv().
It is possible to use csv_read()
, but it requires more extra parameters.
I think it would be sufficient to add a parameter to .from_csv
similar to index_col=sequence in csv_read()
Expected Output
A.columns == B.columns
Output of pd.show_versions()
pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.3.0 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:15 (14 by maintainers)
Top GitHub Comments
To make things more interesting, what about an additional column with only one level header? What is the right parameters to_csv() and from_csv()/read_csv() so as to recover the orginial DataFrame A without unnecessary ‘Unnamed:XXX’?
Now save and reread A:
Any idea to get rid of the Unnamed things?
Ah, true. I think we both overlooked this. Generally, we encourage people to use
read_csv
, as per the docs.By all means, feel free to update the documentation as a PR, though we should consider just deprecating the function (@jreback thoughts?) in the future.