question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

setup.cfg should standardize on UTF-8 for encoding

See original GitHub issue

In jaraco/configparser#34, I learned that although setuptools v40.7.0 presumably added support for non-ASCII, there are still environments where loading non-ASCII is failing.

configparser # easy_install --version
setuptools 40.8.0 from c:\python37\lib\site-packages (Python 3.7)
configparser 3.7.2 # python setup.py egg_info
Traceback (most recent call last):
  File "setup.py", line 5, in <module>
    package_dir={'': 'src'},  
  File "C:\Python37\lib\site-packages\setuptools\__init__.py", line 144, in setup
    _install_setup_requires(attrs)  
  File "C:\Python37\lib\site-packages\setuptools\__init__.py", line 137, in _install_setup_requires
    dist.parse_config_files(ignore_option_errors=True)
  File "C:\Python37\lib\site-packages\setuptools\dist.py", line 702, in parse_config_files
    self._parse_config_files(filenames=filenames)
  File "C:\Python37\lib\site-packages\setuptools\dist.py", line 599, in _parse_config_files
    (parser.read_file if six.PY3 else parser.readfp)(reader)
  File "C:\Python37\lib\configparser.py", line 717, in read_file
    self._read(f, source)
  File "C:\Python37\lib\configparser.py", line 1014, in _read
    for lineno, line in enumerate(fp, start=1):
  File "C:\Python37\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 103: character maps to <undefined>

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:10 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
jaracocommented, Apr 5, 2019

Given that setopt and its edit_config function need to write to the config file, I’m even more strongly inclined now to remove support for specifying an encoding in setup.cfg files and instead insist on UTF-8, especially since commands like bdist_rpm invoke egg_info which in turn rewrites the config file.

1reaction
stanislavlevincommented, Mar 28, 2019

Hi @jaraco , I have a related issue. setup.cfg contains Unicode symbols and set explicit UTF-8 encoding:

"# -*- coding: utf-8 -*-                                                         
[metadata]
...

When I run tox to test itself under Python2:

Processing ./.tox/.tmp/package/2/tox-3.8.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/src/tmp/pip-req-build-jUmuTe/setup.py", line 18, in <module>
        package_dir={"": "src"},
      File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/__init__.py", line 144, in setup
        _install_setup_requires(attrs)
      File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/__init__.py", line 137, in _install_setup_requires
        dist.parse_config_files(ignore_option_errors=True)
      File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/dist.py", line 702, in parse_config_files
        self._parse_config_files(filenames=filenames)
      File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/dist.py", line 599, in _parse_config_files
        (parser.read_file if six.PY3 else parser.readfp)(reader)
      File "/usr/lib64/python2.7/ConfigParser.py", line 324, in readfp
        self._read(fp, filename)
      File "/usr/lib64/python2.7/ConfigParser.py", line 479, in _read
        line = fp.readline()
      File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib64/python2.7/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 345: ordinal not in range(128)
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /usr/src/tmp/pip-req-build-jUmuTe/

This is because “edit_config” doesn’t pass down the original encoding:

(Pdb) bt
  /usr/src/RPM/BUILD/python-module-tox-3.8.0/setup.py(18)<module>()
-> package_dir={"": "src"},
  /usr/lib/python2.7/site-packages/setuptools/__init__.py(145)setup()
-> return distutils.core.setup(**attrs)
  /usr/lib64/python2.7/distutils/core.py(151)setup()
-> dist.run_commands()
  /usr/lib64/python2.7/distutils/dist.py(953)run_commands()
-> self.run_command(cmd)
  /usr/lib64/python2.7/distutils/dist.py(972)run_command()
-> cmd_obj.run()
  /usr/lib/python2.7/site-packages/setuptools/command/sdist.py(54)run()
-> self.make_distribution()
  /usr/lib/python2.7/site-packages/setuptools/command/sdist.py(78)make_distribution()
-> orig.sdist.make_distribution(self)
  /usr/lib64/python2.7/distutils/command/sdist.py(456)make_distribution()
-> self.make_release_tree(base_dir, self.filelist.files)
  /usr/lib/python2.7/site-packages/setuptools/command/sdist.py(168)make_release_tree()
-> self.get_finalized_command('egg_info').save_version_info(dest)
  /usr/lib/python2.7/site-packages/setuptools/command/egg_info.py(191)save_version_info()
-> edit_config(filename, dict(egg_info=egg_info))
> /usr/lib/python2.7/site-packages/setuptools/command/setopt.py(74)edit_config()
-> opts.write(f)
(Pdb)

The output is something like:

[metadata]
name = tox
locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
Read more comments on GitHub >

github_iconTop Results From Across the Web

What's the right way to use Unicode metadata in setup.py?
Show activity on this post. I was writing a setup.py for a Python package using setuptools and wanted to include a non-ASCII character...
Read more >
UTF-8: The Secret of Character Encoding - HTML Purifier
8-bit encodings are extensions to ASCII that add a potpourri of useful, non-standard characters like é and æ. They can only add 127...
Read more >
PR#33: Set encoding for setup.cfg - python-daemon - Pagure.io
As of version 40.9.0, Setuptools ignores encoding declarations and simply requires the text to be encoded as UTF-8.
Read more >
utf8: Unicode Text Processing
Input, validate, normalize, encode, format, and display. Details Functions for manipulating and printing UTF-8 encoded text:
Read more >
UTF-8 and Unicode FAQ for Unix/Linux
With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that were designed entirely around ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found