question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Defining utf8 encoding for all `.py` files

See original GitHub issue

Is it a good idea to add magic encoding (PEP 263) on all .py files to handle non-ascii systems e.g. #1024 ?

# -*- coding: utf-8 -*-

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
alvationscommented, Apr 12, 2017

Hopefully these will go away once the Python2.7 goes away. Closing the issue =)

1reaction
nschneidcommented, Mar 21, 2017

If the user writes code with an import that uses non-ASCII characters, then yes, that file should declare its source encoding. But I don’t think it would help to declare the source encoding in files that are pure-ASCII.

The problem in #1024 seems to be the unicode_literals import. os.path.expanduser() apparently assumes that the type of its argument—which, with unicode_literals, is unicode—matches the type of the environment variable, which is str (bytes in 2.7, unicode in 3.x). This mismatch in 2.7 is not a problem with all-ASCII strings, because an ASCII bytestring can be concatenated with a unicode string. But it fails if the bytestring is non-ASCII:

>>> u'\u102a'.encode('utf-8')
'\xe1\x80\xaa'
>>> '\xe1\x80\xaa' + '~/'
'\xe1\x80\xaa~/'
>>> '\xe1\x80\xaa' + u'~/'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 0: ordinal not in range(128)

Is the unicode_literals import really necessary? If so, then perhaps all literals passed to os.path should be wrapped in str(), effectively negating unicode_literals in 2.7.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Working with UTF-8 encoding in Python source - Stack Overflow
In Python 3, UTF-8 is the default source encoding (see PEP 3120), so Unicode characters can be used anywhere. In Python 2, you...
Read more >
Unicode HOWTO — Python 3.11.1 documentation
Usually this is implemented by converting the Unicode string into some encoding that varies depending on the system. Today Python is converging on...
Read more >
A Guide to Unicode, UTF-8 and Strings in Python
UTF-8 : It uses 1, 2, 3 or 4 bytes to encode every code point. It is backwards compatible with ASCII. All English...
Read more >
Unicode & Character Encodings in Python: A Painless Guide
This means that you don't need # -*- coding: UTF-8 -*- at the top of .py files in Python 3. All text (...
Read more >
How to Use UTF-8 with Python (evanjones.ca)
Any ASCII-compatible encoding is permitted. For details, see the Defining Python Source Code Encodings specification. Other Resources. Using ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found