question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Numpy Ndarray has no convenience method to set encoding (expect some thing like this np.astype(str, encoding))

See original GitHub issue

we need to set the encoding during type conversion np dtype object to np dtype string with a encoding “utf-8” , but what i see is only ascii or unicode np.astype(np.unicode_)

unicode_np_array = np.ndarray(shape=(1,1,1), dtype=object)
for index, x in np.ndenumerate(unicode_np_array):
   unicode_np_array.itemset(index,b'search_key_\xc3\x84')
np.astype(np.unicode_)
yields this error 
UnicodeDecodeError: 'ascii' codec can't decode
byte 0xc3 in position 12: ordinal not in range(128)

because of that we do ndenumerate and apply decoding for each element which has a performance impact . It would be nice if there is any convenience method to do this which internally parallelize the encoding.

import numpy as np
import time

unicode_np_array = np.ndarray(shape=(1,1,1), dtype=object)
for index, x in np.ndenumerate(unicode_np_array):
   unicode_np_array.itemset(index,b'search_key_\xc3\x84')

print ("The unicode numpy array :"+ str(unicode_np_array))

ascii_np_array = np.ndarray(shape=(1,1,1), dtype=object)
for index, x in np.ndenumerate(ascii_np_array):
   ascii_np_array.itemset(index,b'search_key')
print ("The ascii numpy array :"+ str(ascii_np_array))

unicode_conversion_start_time = time.time()
for index, x in np.ndenumerate(unicode_np_array):
   unicode_np_array.itemset(index,x.decode('utf-8'))
unicode_conversion_end_time = time.time()

unicode_conversion_time = (unicode_conversion_end_time - unicode_conversion_start_time)*1000
print ("The unicode numpy array after conversion :"+ str(unicode_np_array))
print ("The unicode numpy conversion time :"+ str(unicode_conversion_time))

ascii_conversion_start_time = time.time()
ascii_np_array = ascii_np_array.astype(str)
ascii_conversion_end_time = time.time()

ascii_conversion_time = (ascii_conversion_end_time - ascii_conversion_start_time)*1000
print ("The ascii numpy array after conversion :"+ str(ascii_np_array))
print ("The ascii numpy conversion time :"+ str(ascii_conversion_time))

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

5reactions
MartinNowakcommented, Jun 3, 2019

np.char.decode(unicode_np_array.astype(np.bytes_), 'UTF-8')

0reactions
ScriptPupcommented, Dec 1, 2020

After hours of banging by head against this, turns out to be incredibly simple. Thanks @mattip, that does it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to decode a numpy array of encoded literals/strings in ...
Each string in the NumPy array is in the form b'MD18EE instead of MD18EE . For example: import numpy as np print(array1) (b'first_element',...
Read more >
Data type objects (dtype) — NumPy v1.25.dev0 Manual
A data type object (an instance of numpy.dtype class) describes how the bytes in ... to UCS4 encoded unicode strings, while string is...
Read more >
Understanding Data Types in Python
Effective data-driven science and computation requires understanding how data is stored and manipulated. This section outlines and contrasts how arrays of ...
Read more >
IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
The pandas I/O API is a set of top level reader functions accessed like ... In [13]: import numpy as np In [14]:...
Read more >
Built-in Types — Python 3.11.1 documentation
Operations and built-in functions that have a Boolean result always ... Passing a bytes object to str() without the encoding or errors arguments...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found