question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What to use instead of pd.read_msgpack/df.to_msgpack

See original GitHub issue
FutureWarning: to_msgpack is deprecated and will be removed in a future version.
It is recommended to use pyarrow for on-the-wire transmission of pandas objects.

Is there a link/pointer for how to do this?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:15 (6 by maintainers)

github_iconTop GitHub Comments

9reactions
barnabaskdrcommented, Dec 21, 2019

I would also like to keep the to and read msgpack functions. As others have pointed out there is no real replacement if you consider reading and writing speed as well as filesize.

9reactions
dragoljubcommented, Aug 21, 2019

@jreback @TomAugspurger I have been using to/read_msgpack for quite some time to read large DataFrames ~1-2GB range from filers. In my experiencing if you are opening and reading DataFrames over network shared disks/filers read_msgpack is at least 2-3X faster than any other I/O system Pandas currently has, HDF5, pickle, parquet, etc.

I think this may be because msgpack is de-serializing into pandas columns as its reading over the network thereby saving time compared to a copy and then deserialize approach.

I would love to keep this capability in pandas. As of Pandas 0.25.0 I don’t see a direct Arrow/Flight replacement that takes a file path name and writes/reads data as efficiently to/from network storage.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas msgpack vs pickle - python - Stack Overflow
Pickle is better for the following: Numerical data or anything that uses the buffer protocol (numpy arrays) (though only if you use a ......
Read more >
MessagePack: It's like JSON. but fast and small.
It's like JSON. but fast and small. MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like...
Read more >
pandas-msgpack's documentation! - Read the Docs
The pandas_msgpack module provides an interface from pandas https://pandas.pydata.org to the msgpack library. This is a lightweight portable binary format, ...
Read more >
The Best Format to Save Pandas Data | by Ilia Zaitsev
Formats to Compare ; MessagePack — it's like JSON but fast and small ; HDF5 —a file format designed to store and organize...
Read more >
mbf-pandas-msgpack - PyPI
In 2019, pandas deprecated the msgpack io interface, suggesting people use pyarrow instead. Unfortunatly, pyarrow doesn't do columns-containing-tuples, and we ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found