question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug with NumPy `loadtxt()` and unicode strings

See original GitHub issue

Please, refer to this question posted in StackOverflow::

http://stackoverflow.com/q/22936790/832621

The OP uses windows and ISO-8859 text file created by linux with very long lines, with CRLF line terminators.

When reading into NumPy, except the first line which contains labels (with special characters, usually only the greek mu):

Python 2.7.6, Numpy 1.8.0, this works perfectly::

data = np.loadtxt('input_file.txt', skiprows=1)

Python 3.4.0, Numpy 1.8.0, gives an error::

np.loadtxt('input_file.txt', skiprows=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/site-packages/numpy/lib/npyio.py", line 796, in loadtxt
next(fh)
File "/usr/lib/python3.4/codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 4158: invalid     
start byte

It worked with genfromtxt().

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
rossbarcommented, Aug 4, 2021

This is a pretty old issue and I’m not sure how much of it is still relevant. The closest thing I can find to a reproducer is from one of the linked SO posts, where a user has trouble loading “Côte d’Ivoire” from a iso-8859 encoded file. This should work using loadtxt’s encoding parameter:

>>> fh = io.BytesIO("Côte d'Ivoire".encode('iso-8859-1'))
>>> fh.getvalue()
b"C\xf4te d'Ivoire"
>>> # Note: use delimiter=',' to prevent a split at the space
>>> np.loadtxt(fh, dtype="U", delimiter=",", encoding='iso-8859-1')
array("Côte d'Ivoire", dtype='<U13')

Note that the default value for encoding eventually resolves to sys.getdefaultencoding() in many cases, so users will have to supply the correct encoding if it’s different than whatever the current system default is.

I’m going to close this hoping that the original issue is either obsolete or resolved by e.g. the above example. If the issue persists or there are related file encoding issues, please reopen or open a new issue with a minimal reproducing example.

0reactions
fzh0917commented, Nov 28, 2018

Nice, it’s useful to replace loadtxt function with genfromtxt one.

Read more comments on GitHub >

github_iconTop Results From Across the Web

numpy loadtxt, unicode, and python 2 or 3 - Stack Overflow
Yes, it seems to be a bug in Numpy - it tries to do some parsing even in skipped rows and fails. Better...
Read more >
[Numpy-discussion] using loadtxt to load a text file in to a ...
Hello, I am trying to use the following line of code : fileContent=loadtxt(filePath,dtype=str) in order to load a text file located at path= ......
Read more >
Importing data with genfromtxt — NumPy v1.23 Manual
The only mandatory argument of genfromtxt is the source of the data. It can be a string, a list of strings, a generator...
Read more >
Release Notes — NumPy v1.14 Manual
In the future, calling .item() on arrays or scalars of np.void datatype will ... unicode strings instead of bytes in the resulting arrays....
Read more >
Error with matplotlib when used with Unicode strings-numpy
In order to read strings from a file using loadtxt you have to specify the dtype argument (see docs here). import matplotlib.pyplot as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found