question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Big endian not enforced in float arrays in de/serialisation

See original GitHub issue

Describe the bug

Serialisation https://github.com/openforcefield/openff-toolkit/blob/ce84d821cd2097dc249abac43de2e61249bcf5d5/openff/toolkit/utils/utils.py#L664-L666

newbyteorder does not change the order of the bytes, but only the order that they are interpreted in. e.g.

>>> arr = np.arange(3).astype(float)
>>> arr
array([0., 1., 2.])
>>> arr.newbyteorder(">")
array([0.00000e+000, 3.03865e-319, 3.16202e-322])

So this function writes exactly the same bytes in the same order that were in the array before.

Deserialisation

https://github.com/openforcefield/openff-toolkit/blob/ce84d821cd2097dc249abac43de2e61249bcf5d5/openff/toolkit/utils/utils.py#L692-L694

newbyteorder is not side-effecting, so this doesn’t do anything. See https://github.com/openforcefield/openff-interchange/issues/345 for more.

To Reproduce

Output

Computing environment (please complete the following information):

  • Operating system
  • Output of running conda list

Additional context

The reason that the endian enforcement has not shown up in tests is because both functions have no effect, so they default to the native endiannness of the system. This likely hasn’t been a problem for users transferring files because by and large people work on x86, which is little-endian. I think one could test the functions individually by creating arrays with opposite endians and checking that they are serialized and deserialized properly. e.g. on my Mac

>>> import numpy as np
>>> from openff.toolkit.utils.utils import serialize_numpy, deserialize_numpy
>>> import sys
>>> sys.byteorder
'little'
>>> dt_bigendian = np.dtype(float).newbyteorder(">")
>>> arr = np.arange(3).astype(dt_bigendian)
>>> arr
array([0., 1., 2.])
>>> np.frombuffer(arr.tobytes())
array([0.00000e+000, 3.03865e-319, 3.16202e-322])
>>> actually_serialize_numpy = lambda x: (x.astype(dt_bigendian).tobytes(), x.shape)
>>> actually_deserialize_numpy = lambda x, y: np.reshape(np.frombuffer(x, dtype=dt_bigendian), y)
>>> np.frombuffer(actually_serialize_numpy(arr)[0])
array([0.00000e+000, 3.03865e-319, 3.16202e-322])
>>> little_arr = np.arange(3).astype(float)
>>> np.frombuffer(serialize_numpy(little_arr)[0], dtype=dt_bigendian)
array([0.00000e+000, 3.03865e-319, 3.16202e-322])
>>> actually_deserialize_numpy(*actually_serialize_numpy(arr))
array([0., 1., 2.])
>>> actually_deserialize_numpy(np.arange(3).astype(float).tobytes(), (3,))
array([0.00000e+000, 3.03865e-319, 3.16202e-322])

So tests could look like:

def test_serialize_numpy():
	original = np.arange(3).astype(float)
	dt_little = np.dtype(float).newbyteorder("<")
	dt_big = np.dtype(float).newbyteorder(">")
	arr = original.astype(dt_little)
	assert_allclose(arr, original)
	deserialized = np.from_buffer(serialize_numpy(arr)[0], dtype=dt_big)
	assert_allclose(deserialized, original)

def test_deserialize_numpy():
	original = np.arange(3).astype(float)
	dt_big = np.dtype(float).newbyteorder(">")
	arr = original.astype(dt_big)
	deserialized = deserialize_numpy(arr.tobytes(), arr.shape)
	assert_allclose(deserialized, original)

def test_serialization_roundtrip():
	original = np.arange(3).astype(float)
	deserialized = deserialize_numpy(*serialize_numpy(original))
	assert_allclose(deserialized, original)

These should fail appropriately on a little-endian system; on a big-endian system, code where native endianness is used instead it might pass / fail silently. I think test_serialize_numpy should guard against that.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
lilyminiumcommented, Dec 10, 2021

No worries! I am in NYC at the moment, though, so any feedback might be delayed.

2reactions
j-wagscommented, Dec 10, 2021

Thanks for catching this, @lilyminium! This highlights the danger of my “just frankenstein code from stackoverflow” method for code development. I’ve read a bit into it and I think I get the gist, but since you have this more fully understood is it OK if @mattwthompson tags you for review once he has the PR in an acceptable state? I don’t have time to go as deep right now so I’d most likely just rubber-stamp the PR if I were primary reviewer.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Question about deserializing some numbers (bug??)
Why are you trying to pickle.loads(x) where x is no a pickle bytes string? The protocol has not be designed to receive this...
Read more >
Micro-CDR/README.md at master - GitHub
This library is focused on embedded and resource-limited systems. Micro CDR uses a static buffer, and allow to serialize and deserialize in both,...
Read more >
Writing a Struct Deserializer with Zig Metaprogramming
I recently designed a simple struct deserializer for reading game ... is big endian, which means all of the data in the resource...
Read more >
Encoding | Protocol Buffers - Google Developers
When a message is serialized, there is no guaranteed order for how its known or unknown fields will be written. Serialization order is...
Read more >
Serialization and Unserialization, C++ FAQ - Standard C++
How do I serialize objects that contain pointers to other objects, but those pointers form a tree with no cycles and only “trivial”...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found