Bug with NumPy `loadtxt()` and unicode strings
See original GitHub issuePlease, refer to this question posted in StackOverflow::
http://stackoverflow.com/q/22936790/832621
The OP uses windows and ISO-8859
text file created by linux with very long lines, with CRLF
line terminators.
When reading into NumPy, except the first line which contains labels (with special characters, usually only the greek mu):
Python 2.7.6, Numpy 1.8.0, this works perfectly::
data = np.loadtxt('input_file.txt', skiprows=1)
Python 3.4.0, Numpy 1.8.0, gives an error::
np.loadtxt('input_file.txt', skiprows=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/site-packages/numpy/lib/npyio.py", line 796, in loadtxt
next(fh)
File "/usr/lib/python3.4/codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 4158: invalid
start byte
It worked with genfromtxt()
.
Issue Analytics
- State:
- Created 9 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
numpy loadtxt, unicode, and python 2 or 3 - Stack Overflow
Yes, it seems to be a bug in Numpy - it tries to do some parsing even in skipped rows and fails. Better...
Read more >[Numpy-discussion] using loadtxt to load a text file in to a ...
Hello, I am trying to use the following line of code : fileContent=loadtxt(filePath,dtype=str) in order to load a text file located at path= ......
Read more >Importing data with genfromtxt — NumPy v1.23 Manual
The only mandatory argument of genfromtxt is the source of the data. It can be a string, a list of strings, a generator...
Read more >Release Notes — NumPy v1.14 Manual
In the future, calling .item() on arrays or scalars of np.void datatype will ... unicode strings instead of bytes in the resulting arrays....
Read more >Error with matplotlib when used with Unicode strings-numpy
In order to read strings from a file using loadtxt you have to specify the dtype argument (see docs here). import matplotlib.pyplot as...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is a pretty old issue and I’m not sure how much of it is still relevant. The closest thing I can find to a reproducer is from one of the linked SO posts, where a user has trouble loading “Côte d’Ivoire” from a iso-8859 encoded file. This should work using loadtxt’s
encoding
parameter:Note that the default value for
encoding
eventually resolves tosys.getdefaultencoding()
in many cases, so users will have to supply the correct encoding if it’s different than whatever the current system default is.I’m going to close this hoping that the original issue is either obsolete or resolved by e.g. the above example. If the issue persists or there are related file encoding issues, please reopen or open a new issue with a minimal reproducing example.
Nice, it’s useful to replace
loadtxt
function withgenfromtxt
one.