How should I store frames with multiindex columns in CSV?
See original GitHub issueHello.
I tried to save a dataframe with MultiIndex used as columns to a CSV file and load it back, but I had no luck.
# Create a fame with multiindex columns
frame = pd.DataFrame({('AAPL', 'OPEN'): [1, 2, 3, 4], ('AAPL', 'CLOSE'): [1, 2, 3, 4], ('MSFT', 'OPEN'): [1, 2, 3, 4], ('MSFT', 'CLOSE'): [1, 2, 3, 4]})
# Make sure it was created as wanted.
frame
# AAPL MSFT
# CLOSE OPEN CLOSE OPEN
# 0 1 1 1 1
# 1 2 2 2 2
# 2 3 3 3 3
# 3 4 4 4 4
# Try to convert the frame to CSV
s1 = frame.to_csv()
s2 = frame.to_csv(tupleize_cols=True)
# FutureWarning displayed - tupleize_cols is deprecated.
print(s1)
# ,AAPL,AAPL,MSFT,MSFT
# ,CLOSE,OPEN,CLOSE,OPEN
# 0,1,1,1,1
# 1,2,2,2,2
# 2,3,3,3,3
# 3,4,4,4,4
print(s2)
# ,"('AAPL', 'CLOSE')","('AAPL', 'OPEN')","('MSFT', 'CLOSE')","('MSFT', 'OPEN')"
# 0,1,1,1,1
# 1,2,2,2,2
# 2,3,3,3,3
# 3,4,4,4,4
# Read the CSV strings back to DataFrames
f1 = pd.read_csv(StringIO(s1))
f2 = pd.read_csv(StringIO(s2), tupleize_cols=True)
# Warning about tupleize_cols here
# Both frames does not look like the original one.
f1
# Unnamed: 0 AAPL AAPL.1 MSFT MSFT.1
# 0 NaN CLOSE OPEN CLOSE OPEN
# 1 0.0 1 1 1 1
# 2 1.0 2 2 2 2
# 3 2.0 3 3 3 3
# 4 3.0 4 4 4 4
f2
# Unnamed: 0 ('AAPL', 'CLOSE') ('AAPL', 'OPEN') ('MSFT', 'CLOSE') ('MSFT', 'OPEN')
# 0 0 1 1 1 1
# 1 1 2 2 2 2
# 2 2 3 3 3 3
# 3 3 4 4 4 4
As you see, both frames don’t have multiindexed columns as original one. So, how should I save a DataFrame with multiindexed columns to CSV file and load it back to get a frame same to the original one?
I also tried to save as JSON, but also encountered problems. Here is what the frame shown above is converted to.
frame.to_json()
'{"["AAPL","CLOSE"]":{"0":1,"1":2,"2":3,"3":4},"["AAPL","OPEN"]":{"0":1,"1":2,"2":3,"3":4},"["MSFT","CLOSE"]":{"0":1,"1":2,"2":3,"3":4},"["MSFT","OPEN"]":{"0":1,"1":2,"2":3,"3":4}}'
So, tupleized multiindexed column names are obviously incorrectly quoted.
With best regards,
Alex.
INSTALLED VERSIONS
commit: None python: 3.4.2.final.0 python-bits: 32 OS: Linux OS-release: 3.16.0-6-686-pae machine: i686 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.24.0.dev0+318.g272bbdc pytest: 3.6.3 pip: 1.5.6 setuptools: 5.5.1 Cython: 0.28.4 numpy: 1.14.5 scipy: None pyarrow: None xarray: None IPython: 6.4.0 sphinx: None patsy: None dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999 sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (8 by maintainers)
Top GitHub Comments
read_csv
can handle hierarchical columns - but they must be specified on the way in with a list to theheader
arg.Can you open a separate bug for the JSON
orient="split"
issue? That does seem off.As a side note on
orient="table"
: