Numpy Ndarray has no convenience method to set encoding (expect some thing like this np.astype(str, encoding))
See original GitHub issuewe need to set the encoding during type conversion
np dtype object
to np dtype string
with a encoding “utf-8” ,
but what i see is only ascii or unicode np.astype(np.unicode_)
unicode_np_array = np.ndarray(shape=(1,1,1), dtype=object)
for index, x in np.ndenumerate(unicode_np_array):
unicode_np_array.itemset(index,b'search_key_\xc3\x84')
np.astype(np.unicode_)
yields this error
UnicodeDecodeError: 'ascii' codec can't decode
byte 0xc3 in position 12: ordinal not in range(128)
because of that we do ndenumerate and apply decoding for each element which has a performance impact . It would be nice if there is any convenience method to do this which internally parallelize the encoding.
import numpy as np
import time
unicode_np_array = np.ndarray(shape=(1,1,1), dtype=object)
for index, x in np.ndenumerate(unicode_np_array):
unicode_np_array.itemset(index,b'search_key_\xc3\x84')
print ("The unicode numpy array :"+ str(unicode_np_array))
ascii_np_array = np.ndarray(shape=(1,1,1), dtype=object)
for index, x in np.ndenumerate(ascii_np_array):
ascii_np_array.itemset(index,b'search_key')
print ("The ascii numpy array :"+ str(ascii_np_array))
unicode_conversion_start_time = time.time()
for index, x in np.ndenumerate(unicode_np_array):
unicode_np_array.itemset(index,x.decode('utf-8'))
unicode_conversion_end_time = time.time()
unicode_conversion_time = (unicode_conversion_end_time - unicode_conversion_start_time)*1000
print ("The unicode numpy array after conversion :"+ str(unicode_np_array))
print ("The unicode numpy conversion time :"+ str(unicode_conversion_time))
ascii_conversion_start_time = time.time()
ascii_np_array = ascii_np_array.astype(str)
ascii_conversion_end_time = time.time()
ascii_conversion_time = (ascii_conversion_end_time - ascii_conversion_start_time)*1000
print ("The ascii numpy array after conversion :"+ str(ascii_np_array))
print ("The ascii numpy conversion time :"+ str(ascii_conversion_time))
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:7 (3 by maintainers)
Top Results From Across the Web
How to decode a numpy array of encoded literals/strings in ...
Each string in the NumPy array is in the form b'MD18EE instead of MD18EE . For example: import numpy as np print(array1) (b'first_element',...
Read more >Data type objects (dtype) — NumPy v1.25.dev0 Manual
A data type object (an instance of numpy.dtype class) describes how the bytes in ... to UCS4 encoded unicode strings, while string is...
Read more >Understanding Data Types in Python
Effective data-driven science and computation requires understanding how data is stored and manipulated. This section outlines and contrasts how arrays of ...
Read more >IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
The pandas I/O API is a set of top level reader functions accessed like ... In [13]: import numpy as np In [14]:...
Read more >Built-in Types — Python 3.11.1 documentation
Operations and built-in functions that have a Boolean result always ... Passing a bytes object to str() without the encoding or errors arguments...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
np.char.decode(unicode_np_array.astype(np.bytes_), 'UTF-8')
After hours of banging by head against this, turns out to be incredibly simple. Thanks @mattip, that does it.