Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What to use instead of pd.read_msgpack/df.to_msgpack

See original GitHub issue

FutureWarning: to_msgpack is deprecated and will be removed in a future version.
It is recommended to use pyarrow for on-the-wire transmission of pandas objects.

Is there a link/pointer for how to do this?

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:15 (6 by maintainers)

Top GitHub Comments

9reactions

barnabaskdrcommented, Dec 21, 2019

I would also like to keep the to and read msgpack functions. As others have pointed out there is no real replacement if you consider reading and writing speed as well as filesize.

9reactions

dragoljubcommented, Aug 21, 2019

@jreback @TomAugspurger I have been using to/read_msgpack for quite some time to read large DataFrames ~1-2GB range from filers. In my experiencing if you are opening and reading DataFrames over network shared disks/filers read_msgpack is at least 2-3X faster than any other I/O system Pandas currently has, HDF5, pickle, parquet, etc.

I think this may be because msgpack is de-serializing into pandas columns as its reading over the network thereby saving time compared to a copy and then deserialize approach.

I would love to keep this capability in pandas. As of Pandas 0.25.0 I don’t see a direct Arrow/Flight replacement that takes a file path name and writes/reads data as efficiently to/from network storage.

Top Results From Across the Web

Pandas msgpack vs pickle - python - Stack Overflow

Pickle is better for the following: Numerical data or anything that uses the buffer protocol (numpy arrays) (though only if you use a ......

MessagePack: It's like JSON. but fast and small.

It's like JSON. but fast and small. MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like...

pandas-msgpack's documentation! - Read the Docs

The pandas_msgpack module provides an interface from pandas https://pandas.pydata.org to the msgpack library. This is a lightweight portable binary format, ...

The Best Format to Save Pandas Data | by Ilia Zaitsev

Formats to Compare ; MessagePack — it's like JSON but fast and small ; HDF5 —a file format designed to store and organize...

mbf-pandas-msgpack - PyPI

In 2019, pandas deprecated the msgpack io interface, suggesting people use pyarrow instead. Unfortunatly, pyarrow doesn't do columns-containing-tuples, and we ...