question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Serialization with `"msgpack"` doesn't preserve `list`s

See original GitHub issue

Currently if a list is serialized (and is small enough), it will be handled correctly.

In [1]: from distributed.protocol import serialize, deserialize                 

In [2]: t = (0, 1, 2)                                                           

In [3]: deserialize(*serialize(t, serializers=["msgpack"]))                     
Out[3]: (0, 1, 2)

In [4]: l = [0, 1, 2]                                                           

In [5]: deserialize(*serialize(l, serializers=["msgpack"]))                     
Out[5]: [0, 1, 2]

However for larger lists, this breaks down and a tuple is returned instead.

In [1]: from distributed.protocol import serialize, deserialize                 

In [2]: t = (0, 1, 2, 3, 4, 5, 6)                                               

In [3]: deserialize(*serialize(t, serializers=["msgpack"]))                     
Out[3]: (0, 1, 2, 3, 4, 5, 6)

In [4]: l = [0, 1, 2, 3, 4, 5, 6]                                               

In [5]: deserialize(*serialize(l, serializers=["msgpack"]))                     
Out[5]: (0, 1, 2, 3, 4, 5, 6)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
jakirkhamcommented, Apr 17, 2020

It’s worth noting msgpack itself by default will always return a list.

In [1]: import msgpack                                                          

In [2]: t = (0, 1, 2)                                                           

In [3]: msgpack.loads(msgpack.dumps(t))                                         
Out[3]: [0, 1, 2]

In [4]: l = [0, 1, 2]                                                           

In [5]: msgpack.loads(msgpack.dumps(l))                                         
Out[5]: [0, 1, 2]

That said, we actually force it to return a tuple, which presents its own set of issues.

In [6]: msgpack.loads(msgpack.dumps(l), use_list=False)                         
Out[6]: (0, 1, 2)

This choice goes back to PR ( https://github.com/dask/distributed/pull/2000 ). Though it seems we knew this could present a problem ( https://github.com/dask/distributed/pull/2000#issuecomment-396327604 ). Not sure if the reasoning behind that change still holds today or IOW what our constraints are now.

Edit: Also related is this upstream discussion ( https://github.com/msgpack/msgpack-python/issues/98 ).

1reaction
jakirkhamcommented, May 20, 2020

Yeah that makes sense. Agree it is error-prone currently and it would be good to get away from that.

Maybe we can come up with something using ExtType?

import msgpack


def default(obj):
    if isinstance(obj, tuple):
        return msgpack.ExtType(10, msgpack.packb(list(obj)))
    else:
        raise TypeError("Unknown type: %s" % repr(type(obj)))

def ext_hook(code, data):
    if code == 10:
        return tuple(msgpack.unpackb(data))
    else:
        return msgpack.ExtType(code, data)


data = (2, 3)
packed = msgpack.packb(data, default=default, strict_types=True, use_bin_type=True)
unpacked = msgpack.unpackb(packed, ext_hook=ext_hook)
print((data, unpacked))


data = [2, 3]
packed = msgpack.packb(data, default=default, strict_types=True, use_bin_type=True)
unpacked = msgpack.unpackb(packed, ext_hook=ext_hook)
print((data, unpacked))
Read more comments on GitHub >

github_iconTop Results From Across the Web

MessagePack: It's like JSON. but fast and small.
MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it's faster and smaller.
Read more >
How to save & append to a serialized MessagePack binary file ...
Firstly, you could deserialize a List<List<struct_realTime>> , append to it, and then serialize the entire thing back to the file.
Read more >
Use MessagePack Hub Protocol in SignalR for ASP.NET Core
Kind is not preserved when serializing/deserializing. The MessagePack protocol doesn't provide a way to encode the Kind value of a DateTime . As ......
Read more >
The Best Format to Save Pandas Data | by Ilia Zaitsev
Formats to Compare · Plain-text CSV — a good old friend of a data scientist · Pickle — a Python's way to serialize...
Read more >
Saving and Loading · spaCy Usage Documentation
When serializing the pipeline, keep in mind that this will only save out the ... the values of extension attributes (if they're serializable...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found