MAINT: Numpy uses `PyUnicode_AS_DATA` which will be deprecated eventually
See original GitHub issueI originally raised this as https://github.com/pandas-dev/pandas/issues/21758 thinking it was a pandas
issue, but when I delved more deeply I found that it was numpy
.
Using np.array
changes the return value of sys.getsizeof
on the first single character string that is passed to it.
Reproducing code example:
Pasting this into a Python REPL:
import sys
sys.getsizeof('a')
sys.getsizeof('b')
import numpy as np
sys.getsizeof('a')
sys.getsizeof('b')
np.array(['b'], dtype=(np.str_, 1))
sys.getsizeof('a')
sys.getsizeof('b')
sys.getsizeof('c')
I get output like this:
$ python
Python 3.7.0 (default, Jun 28 2018, 13:15:42)
[GCC 7.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof('a')
50
>>> sys.getsizeof('b')
50
>>> import numpy as np
>>> sys.getsizeof('a')
50
>>> sys.getsizeof('b')
50
>>> np.array(['b'], dtype=(np.str_, 1))
array(['b'], dtype='<U1')
>>> sys.getsizeof('a')
50
>>> sys.getsizeof('b')
58
>>> sys.getsizeof('c')
50
>>>
>>>
Notice how sys.getsizeof('b')
has increased from 50 to 58 following the call to np.array
. This is very much unexpected behaviour.
Numpy/Python version information:
Output from import sys, numpy; print(numpy.__version__, sys.version)
:
>>> import sys, numpy; print(numpy.__version__, sys.version)
1.15.0 3.7.0 (default, Jun 28 2018, 13:15:42)
[GCC 7.2.0]
I’ve seen this with a mixture of Python
versions and numpy
versions. I’m using Ubuntu 18.04LTS 64-bit, but I’ve also seen this on Ubuntu 16.04LTS 64-bit.
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
numpy.deprecate — NumPy v1.24 Manual
If given, the deprecation message is that old_name is deprecated and new_name should be used instead. messagestr, optional.
Read more >Deprecation status of the NumPy matrix class - Stack Overflow
I keep being told that I should use the ndarray class instead. Is it worth/safe using the matrix class in new code I...
Read more >NumPy 1.24 Release Notes - GitHub
The numpy.fastCopyAndTranspose function has been deprecated. Use the ... now deprecated and will eventually be removed. (gh-22607) ...
Read more >Migrating to Shapely 1.8 / 2.0 — Shapely 2.0.0 documentation
The array_interface() method and ctypes attribute will be removed in Shapely 2.0, but since Shapely will start requiring NumPy as a dependency, you...
Read more >Release Notes — NumPy v1.14 Manual
minlength=0 should be used instead. Calling np.fromstring with the default value of the sep argument is deprecated. When that argument is not ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think in most cases, we still use API which causes the unicode object to cache the result. (E.g. as utf8 string) but I am not convinced this is a bad thing in most cases, as opposed to just surprising. Also, I expect we have to use utf8 in most of those cases, since we probably need ascii compatibility.
So, I will close it. I am not quite sure, but it would seem more helpful to open a new issue that is more to the point, and personally, I am not sure there is an issue at all.
The original behavior also no longer occurs: