What to use instead of pd.read_msgpack/df.to_msgpack
See original GitHub issueFutureWarning: to_msgpack is deprecated and will be removed in a future version.
It is recommended to use pyarrow for on-the-wire transmission of pandas objects.
Is there a link/pointer for how to do this?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:15 (6 by maintainers)
Top Results From Across the Web
Pandas msgpack vs pickle - python - Stack Overflow
Pickle is better for the following: Numerical data or anything that uses the buffer protocol (numpy arrays) (though only if you use a ......
Read more >MessagePack: It's like JSON. but fast and small.
It's like JSON. but fast and small. MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like...
Read more >pandas-msgpack's documentation! - Read the Docs
The pandas_msgpack module provides an interface from pandas https://pandas.pydata.org to the msgpack library. This is a lightweight portable binary format, ...
Read more >The Best Format to Save Pandas Data | by Ilia Zaitsev
Formats to Compare ; MessagePack — it's like JSON but fast and small ; HDF5 —a file format designed to store and organize...
Read more >mbf-pandas-msgpack - PyPI
In 2019, pandas deprecated the msgpack io interface, suggesting people use pyarrow instead. Unfortunatly, pyarrow doesn't do columns-containing-tuples, and we ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I would also like to keep the to and read msgpack functions. As others have pointed out there is no real replacement if you consider reading and writing speed as well as filesize.
@jreback @TomAugspurger I have been using to/read_msgpack for quite some time to read large DataFrames ~1-2GB range from filers. In my experiencing if you are opening and reading DataFrames over network shared disks/filers read_msgpack is at least 2-3X faster than any other I/O system Pandas currently has, HDF5, pickle, parquet, etc.
I think this may be because msgpack is de-serializing into pandas columns as its reading over the network thereby saving time compared to a copy and then deserialize approach.
I would love to keep this capability in pandas. As of Pandas 0.25.0 I don’t see a direct Arrow/Flight replacement that takes a file path name and writes/reads data as efficiently to/from network storage.