question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How should I store frames with multiindex columns in CSV?

See original GitHub issue

Hello.

I tried to save a dataframe with MultiIndex used as columns to a CSV file and load it back, but I had no luck.

# Create a fame with multiindex columns
frame = pd.DataFrame({('AAPL', 'OPEN'): [1, 2, 3, 4], ('AAPL', 'CLOSE'): [1, 2, 3, 4], ('MSFT', 'OPEN'): [1, 2, 3, 4], ('MSFT', 'CLOSE'): [1, 2, 3, 4]})
# Make sure it was created as wanted.
frame
#   AAPL       MSFT     
#   CLOSE OPEN CLOSE OPEN
# 0     1    1     1    1
# 1     2    2     2    2
# 2     3    3     3    3
# 3     4    4     4    4

# Try to convert the frame to CSV
s1 = frame.to_csv()
s2 = frame.to_csv(tupleize_cols=True)
# FutureWarning displayed - tupleize_cols is deprecated.

print(s1)
# ,AAPL,AAPL,MSFT,MSFT
# ,CLOSE,OPEN,CLOSE,OPEN
# 0,1,1,1,1
# 1,2,2,2,2
# 2,3,3,3,3
# 3,4,4,4,4

print(s2)
# ,"('AAPL', 'CLOSE')","('AAPL', 'OPEN')","('MSFT', 'CLOSE')","('MSFT', 'OPEN')"
# 0,1,1,1,1
# 1,2,2,2,2
# 2,3,3,3,3
# 3,4,4,4,4

# Read the CSV strings back to DataFrames
f1 = pd.read_csv(StringIO(s1))
f2 = pd.read_csv(StringIO(s2), tupleize_cols=True)
# Warning about tupleize_cols here

# Both frames does not look like the original one.
f1
#    Unnamed: 0   AAPL AAPL.1   MSFT MSFT.1
# 0         NaN  CLOSE   OPEN  CLOSE   OPEN
# 1         0.0      1      1      1      1
# 2         1.0      2      2      2      2
# 3         2.0      3      3      3      3
# 4         3.0      4      4      4      4

f2
#    Unnamed: 0  ('AAPL', 'CLOSE')  ('AAPL', 'OPEN')  ('MSFT', 'CLOSE')  ('MSFT', 'OPEN')
# 0           0                  1                 1                  1                 1
# 1           1                  2                 2                  2                 2
# 2           2                  3                 3                  3                 3
# 3           3                  4                 4                  4                 4

As you see, both frames don’t have multiindexed columns as original one. So, how should I save a DataFrame with multiindexed columns to CSV file and load it back to get a frame same to the original one?

I also tried to save as JSON, but also encountered problems. Here is what the frame shown above is converted to.

frame.to_json()
'{"["AAPL","CLOSE"]":{"0":1,"1":2,"2":3,"3":4},"["AAPL","OPEN"]":{"0":1,"1":2,"2":3,"3":4},"["MSFT","CLOSE"]":{"0":1,"1":2,"2":3,"3":4},"["MSFT","OPEN"]":{"0":1,"1":2,"2":3,"3":4}}'

So, tupleized multiindexed column names are obviously incorrectly quoted.

With best regards,

Alex.

INSTALLED VERSIONS

commit: None python: 3.4.2.final.0 python-bits: 32 OS: Linux OS-release: 3.16.0-6-686-pae machine: i686 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+318.g272bbdc pytest: 3.6.3 pip: 1.5.6 setuptools: 5.5.1 Cython: 0.28.4 numpy: 1.14.5 scipy: None pyarrow: None xarray: None IPython: 6.4.0 sphinx: None patsy: None dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999 sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

8reactions
chris-b1commented, Jul 19, 2018

read_csv can handle hierarchical columns - but they must be specified on the way in with a list to the header arg.

from io import StringIO
buf = StringIO()
frame.to_csv(buf)
buf.seek(0)

In [109]: pd.read_csv(buf, header=[0,1], index_col=0)
Out[109]: 
  AAPL       MSFT     
 CLOSE OPEN CLOSE OPEN
0     1    1     1    1
1     2    2     2    2
2     3    3     3    3
3     4    4     4    4
0reactions
WillAydcommented, Jul 20, 2018

Can you open a separate bug for the JSON orient="split" issue? That does seem off.

As a side note on orient="table":

  • Timestamp support is being added in #21827
  • Integers are not valid column labels but should be fine as an index; again if you have an example you can provide please open as a bug in a separate issue
Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas - write Multiindex rows with to_csv - Stack Overflow
I am using to_csv to write a Multiindex DataFrame to csv files. The csv file has one column that contains the multiindexes in...
Read more >
Working with Multi-Index Pandas DataFrames
In the following sections, I am going to show you how you can extract rows and columns from a multi-index dataframe.
Read more >
Working with MultiIndex in pandas DataFrame
Use pandas DataFrame.reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the ...
Read more >
MultiIndex / advanced indexing — pandas 1.5.2 documentation
In essence, it enables you to store and manipulate data with an arbitrary number of dimensions in lower dimensional data structures like Series...
Read more >
How do I use the MultiIndex in pandas? - YouTube
One of the most powerful features in pandas is multi-level indexing (or "hierarchical indexing"), which allows you to add extra dimensions ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found