question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MAINT: Numpy uses `PyUnicode_AS_DATA` which will be deprecated eventually

See original GitHub issue

I originally raised this as https://github.com/pandas-dev/pandas/issues/21758 thinking it was a pandas issue, but when I delved more deeply I found that it was numpy.

Using np.array changes the return value of sys.getsizeof on the first single character string that is passed to it.

Reproducing code example:

Pasting this into a Python REPL:

import sys
sys.getsizeof('a')
sys.getsizeof('b')
import numpy as np
sys.getsizeof('a')
sys.getsizeof('b')
np.array(['b'], dtype=(np.str_, 1))
sys.getsizeof('a')
sys.getsizeof('b')
sys.getsizeof('c')

I get output like this:

$ python
Python 3.7.0 (default, Jun 28 2018, 13:15:42)
[GCC 7.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof('a')
50
>>> sys.getsizeof('b')
50
>>> import numpy as np
>>> sys.getsizeof('a')
50
>>> sys.getsizeof('b')
50
>>> np.array(['b'], dtype=(np.str_, 1))
array(['b'], dtype='<U1')
>>> sys.getsizeof('a')
50
>>> sys.getsizeof('b')
58
>>> sys.getsizeof('c')
50
>>>
>>>

Notice how sys.getsizeof('b') has increased from 50 to 58 following the call to np.array. This is very much unexpected behaviour.

Numpy/Python version information:

Output from import sys, numpy; print(numpy.__version__, sys.version):

>>> import sys, numpy; print(numpy.__version__, sys.version)
1.15.0 3.7.0 (default, Jun 28 2018, 13:15:42)
[GCC 7.2.0]

I’ve seen this with a mixture of Python versions and numpy versions. I’m using Ubuntu 18.04LTS 64-bit, but I’ve also seen this on Ubuntu 16.04LTS 64-bit.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
sebergcommented, Sep 24, 2020

I think in most cases, we still use API which causes the unicode object to cache the result. (E.g. as utf8 string) but I am not convinced this is a bad thing in most cases, as opposed to just surprising. Also, I expect we have to use utf8 in most of those cases, since we probably need ascii compatibility.

So, I will close it. I am not quite sure, but it would seem more helpful to open a new issue that is more to the point, and personally, I am not sure there is an issue at all.

0reactions
eric-wiesercommented, Sep 24, 2020

The original behavior also no longer occurs:

In [2]: import sys
   ...: print(sys.getsizeof('a'))
   ...: print(sys.getsizeof('b'))
   ...: import numpy as np
   ...: print(sys.getsizeof('a'))
   ...: print(sys.getsizeof('b'))
   ...: np.array(['b'], dtype=(np.str_, 1))
   ...: print(sys.getsizeof('a'))
   ...: print(sys.getsizeof('b'))
   ...: print(sys.getsizeof('c'))
50
50
50
50
50
50
50
Read more comments on GitHub >

github_iconTop Results From Across the Web

numpy.deprecate — NumPy v1.24 Manual
If given, the deprecation message is that old_name is deprecated and new_name should be used instead. messagestr, optional.
Read more >
Deprecation status of the NumPy matrix class - Stack Overflow
I keep being told that I should use the ndarray class instead. Is it worth/safe using the matrix class in new code I...
Read more >
NumPy 1.24 Release Notes - GitHub
The numpy.fastCopyAndTranspose function has been deprecated. Use the ... now deprecated and will eventually be removed. (gh-22607) ...
Read more >
Migrating to Shapely 1.8 / 2.0 — Shapely 2.0.0 documentation
The array_interface() method and ctypes attribute will be removed in Shapely 2.0, but since Shapely will start requiring NumPy as a dependency, you...
Read more >
Release Notes — NumPy v1.14 Manual
minlength=0 should be used instead. Calling np.fromstring with the default value of the sep argument is deprecated. When that argument is not ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found