Inconsistent handling of integer overflow between Windows and Linux.
See original GitHub issueWhile contributing to scikit-learn, I’ve uncovered an inconsistent behavior between windows and linux. The PR this was uncovered in can be found here: https://github.com/scikit-learn/scikit-learn/pull/8094#issuecomment-269864258
Running this line results in different behavior on windows and linux regardless of whether or not python is 32 or 64 bit.
np.full((2, 2), np.iinfo(np.int32).max, dtype=np.int32).trace()
- The result is
-2
on Windows 10 (64bit) using both Python 3.6-64 and Python 3.6-32 - The result is
4294967294
on Ubuntu 16.04 (64bit) using Python3.5-64 and Python 2.7-64
Obviously it would be good to test the exact same python versions for completeness, but I doubt the result will change. I don’t have consistent access to a windows machine, but I can install 3.6 on Ubuntu and ensure that the result doesn’t magically change to -2.
This behavior is not limited to the trace function
np.full((2, 2), np.iinfo(np.int32).max, dtype=np.int32).sum(axis=0)
shows similar behavior: array([4294967294, 4294967294])
on ubuntu and array([-2, -2]) on windows.
However, this function returns an array instead of a scalar, so we can inspect the dtype. On Ubuntu the dtype is int64, but on Windows the dtype is int32.
Somehow on Ubuntu the result is automatically upcast, but on windows the result overflows and remains an int32. It seems that one of these behaviors should be preferred (ideally the upcast Ubuntu version)
Issue Analytics
- State:
- Created 7 years ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
I think the simplest change would be to always use 32 bits on 32 bit platforms and 64 bits on 64 bit platforms. It would be a small compatibility break, but seeing as things are already incompatible across platforms that doesn’t bother me much.
The big hammer would be to make all accumulations default to 64 bits, but I don’t think that is really necessary.
I would be interested to see how much stuff broke if we were to do this (e.g. try enabling it and then run a few projects test suites). Keeping 32-on-32 and 64-on-64 would certainly not be the end of the world, but just eliminating this source of tricky breakage entirely would be nice if we could get away with it…