problems with LC_ALL=C
See original GitHub issueThere is a pattern of using open(path, 'r').read()
without explicit encoding in pip:
- https://github.com/pypa/pip/blob/develop/setup.py#L9
- https://github.com/pypa/pip/blob/develop/pip/req.py#L274
This pattern causes issues under Python 3.x with ASCII locale because file contents is decoded using ascii in this case and it fails for non-ascii data.
The first occurance (in setup.py) is clearly wrong IMHO: the utility function is used for reading pip’s own index.txt and news.txt files which are encoded to utf8. It may cause the following exception:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 871: ordinal not in range(128)
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 16, in <module>
File "/var/folders/_5/cbsg50991szfp1r9nwxpx8580000gq/T/pip-61p_z7-build/setup.py", line 31, in <module>
"\n\n" + read("docs", "news.txt"))
File "/var/folders/_5/cbsg50991szfp1r9nwxpx8580000gq/T/pip-61p_z7-build/setup.py", line 9, in read
return codecs.open(os.path.join(os.path.abspath(os.path.dirname(__file__)), *parts), 'r').read()
File "/Users/kmike/svn/pip/.tox/py32-ascii/lib/python3.2/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 871: ordinal not in range(128)
if the following is added to pip’s own tox.ini:
[testenv:py32-ascii]
basepython = python3.2
setenv = LC_ALL=C
The second is more tricky and I didn’t debug it. It causes the following exception:
Unpacking /Users/kmike/svn/DAWG/.tox/dist/DAWG-0.5.3.zip
Running setup.py egg_info for package from file:///Users/kmike/svn/DAWG/.tox/dist/DAWG-0.5.3.zip
Exception:
Traceback (most recent call last):
File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/basecommand.py", line 107, in main
status = self.run(options, args)
File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/commands/install.py", line 256, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/req.py", line 1042, in prepare_files
req_to_install.run_egg_info()
File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/req.py", line 241, in run_egg_info
"%(Name)s==%(Version)s" % self.pkg_info())
File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/req.py", line 334, in pkg_info
data = self.egg_info_data('PKG-INFO')
File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/req.py", line 274, in egg_info_data
data = fp.read()
File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2130: ordinal not in range(128)
in https://github.com/kmike/DAWG testing suite (https://github.com/kmike/DAWG/blob/master/tox.ini). DAWG package has a non-ascii README.rst (which is loaded to long_description, binary under Python 2.x and unicode under Python 3.x).
Under Python 2.x this works fine because req.py doesn’t try to decode the data.
Issue Analytics
- State:
- Created 11 years ago
- Comments:12 (7 by maintainers)
Top GitHub Comments
I believe this part of the code base has been removed. pip now uses distlib to read legacy metadata (egg-info), which always uses UTF-8.
Indeed, this was the source of my problem. In one shell I had this in my environment:
LANG=en_US.UTF-8
…and pip worked. In another shell I didn’t, and got the UnicodeDecodeError exception.