Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Big endian not enforced in float arrays in de/serialisation

See original GitHub issue

Describe the bug

Serialisation https://github.com/openforcefield/openff-toolkit/blob/ce84d821cd2097dc249abac43de2e61249bcf5d5/openff/toolkit/utils/utils.py#L664-L666

newbyteorder does not change the order of the bytes, but only the order that they are interpreted in. e.g.

>>> arr = np.arange(3).astype(float)
>>> arr
array([0., 1., 2.])
>>> arr.newbyteorder(">")
array([0.00000e+000, 3.03865e-319, 3.16202e-322])

So this function writes exactly the same bytes in the same order that were in the array before.

Deserialisation

https://github.com/openforcefield/openff-toolkit/blob/ce84d821cd2097dc249abac43de2e61249bcf5d5/openff/toolkit/utils/utils.py#L692-L694

newbyteorder is not side-effecting, so this doesn’t do anything. See https://github.com/openforcefield/openff-interchange/issues/345 for more.

To Reproduce

Output

Computing environment (please complete the following information):

Operating system
Output of running conda list

Additional context

The reason that the endian enforcement has not shown up in tests is because both functions have no effect, so they default to the native endiannness of the system. This likely hasn’t been a problem for users transferring files because by and large people work on x86, which is little-endian. I think one could test the functions individually by creating arrays with opposite endians and checking that they are serialized and deserialized properly. e.g. on my Mac

>>> import numpy as np
>>> from openff.toolkit.utils.utils import serialize_numpy, deserialize_numpy
>>> import sys
>>> sys.byteorder
'little'
>>> dt_bigendian = np.dtype(float).newbyteorder(">")
>>> arr = np.arange(3).astype(dt_bigendian)
>>> arr
array([0., 1., 2.])
>>> np.frombuffer(arr.tobytes())
array([0.00000e+000, 3.03865e-319, 3.16202e-322])
>>> actually_serialize_numpy = lambda x: (x.astype(dt_bigendian).tobytes(), x.shape)
>>> actually_deserialize_numpy = lambda x, y: np.reshape(np.frombuffer(x, dtype=dt_bigendian), y)
>>> np.frombuffer(actually_serialize_numpy(arr)[0])
array([0.00000e+000, 3.03865e-319, 3.16202e-322])
>>> little_arr = np.arange(3).astype(float)
>>> np.frombuffer(serialize_numpy(little_arr)[0], dtype=dt_bigendian)
array([0.00000e+000, 3.03865e-319, 3.16202e-322])
>>> actually_deserialize_numpy(*actually_serialize_numpy(arr))
array([0., 1., 2.])
>>> actually_deserialize_numpy(np.arange(3).astype(float).tobytes(), (3,))
array([0.00000e+000, 3.03865e-319, 3.16202e-322])

So tests could look like:

def test_serialize_numpy():
	original = np.arange(3).astype(float)
	dt_little = np.dtype(float).newbyteorder("<")
	dt_big = np.dtype(float).newbyteorder(">")
	arr = original.astype(dt_little)
	assert_allclose(arr, original)
	deserialized = np.from_buffer(serialize_numpy(arr)[0], dtype=dt_big)
	assert_allclose(deserialized, original)

def test_deserialize_numpy():
	original = np.arange(3).astype(float)
	dt_big = np.dtype(float).newbyteorder(">")
	arr = original.astype(dt_big)
	deserialized = deserialize_numpy(arr.tobytes(), arr.shape)
	assert_allclose(deserialized, original)

def test_serialization_roundtrip():
	original = np.arange(3).astype(float)
	deserialized = deserialize_numpy(*serialize_numpy(original))
	assert_allclose(deserialized, original)

These should fail appropriately on a little-endian system; on a big-endian system, code where native endianness is used instead it might pass / fail silently. I think test_serialize_numpy should guard against that.

Issue Analytics

State:
Created 2 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

3reactions

lilyminiumcommented, Dec 10, 2021

No worries! I am in NYC at the moment, though, so any feedback might be delayed.

2reactions

j-wagscommented, Dec 10, 2021

Thanks for catching this, @lilyminium! This highlights the danger of my “just frankenstein code from stackoverflow” method for code development. I’ve read a bit into it and I think I get the gist, but since you have this more fully understood is it OK if @mattwthompson tags you for review once he has the PR in an acceptable state? I don’t have time to go as deep right now so I’d most likely just rubber-stamp the PR if I were primary reviewer.