Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Defining utf8 encoding for all `.py` files

See original GitHub issue

Is it a good idea to add magic encoding (PEP 263) on all .py files to handle non-ascii systems e.g. #1024 ?

# -*- coding: utf-8 -*-

Issue Analytics

State:
Created 7 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

alvationscommented, Apr 12, 2017

Hopefully these will go away once the Python2.7 goes away. Closing the issue =)

1reaction

nschneidcommented, Mar 21, 2017

If the user writes code with an import that uses non-ASCII characters, then yes, that file should declare its source encoding. But I don’t think it would help to declare the source encoding in files that are pure-ASCII.

The problem in #1024 seems to be the unicode_literals import. os.path.expanduser() apparently assumes that the type of its argument—which, with unicode_literals, is unicode—matches the type of the environment variable, which is str (bytes in 2.7, unicode in 3.x). This mismatch in 2.7 is not a problem with all-ASCII strings, because an ASCII bytestring can be concatenated with a unicode string. But it fails if the bytestring is non-ASCII:

>>> u'\u102a'.encode('utf-8')
'\xe1\x80\xaa'
>>> '\xe1\x80\xaa' + '~/'
'\xe1\x80\xaa~/'
>>> '\xe1\x80\xaa' + u'~/'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 0: ordinal not in range(128)

Is the unicode_literals import really necessary? If so, then perhaps all literals passed to os.path should be wrapped in str(), effectively negating unicode_literals in 2.7.