question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make chardet/charset_normalizer optional?

See original GitHub issue

With a routine version bump of requirements, I noticed chardet had been switched out for charset_normalizer (which I had never heard of before) in #5797, apparently due to LGPL license concerns.

I agree with @sigmavirus24’s comment https://github.com/psf/requests/pull/5797#issuecomment-875158955 that it’s strange for something as central in the Python ecosystem as requests is (45k stars, 8k forks, many contributors at the time of writing) to switch to such a relatively unknown and unproven library (132 stars, 5 forks, 2 contributors) for a hard dependency in something as central in the Python ecosystem as requests is.

The release notes say you could use pip install "requests[use_chardet_on_py3]" to use chardet instead of charset_normalizer, but with that extra set both libraries get installed.

I would imagine many users don’t really necessarily need the charset detection features in Requests; could we open a discussion on making both chardet/charset_normalizer optional, á la requests[chardet] or requests[charset_normalizer]?

AFAICS, the only place where chardet is actually used in requests is Response.apparent_encoding, which is used by Response.text when there is no determined encoding.

Maybe apparent_encoding could try to

  1. as a built-in first attempt, try decoding the content as UTF-8 (which would likely be successful for many cases)
  2. if neither chardet or charset_normalizer is installed, warn the user (“No encoding detection library is installed. Falling back to XXXX. Please see YYYY for instructions” or somesuch) and return e.g. ascii
  3. use either chardet library as per usual

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:8
  • Comments:23 (11 by maintainers)

github_iconTop GitHub Comments

6reactions
sigmavirus24commented, Jul 14, 2021

apparent_encoding genuinely just needs to go away. That can’t be done until a major release. Once that happens, we don’t need dependencies on either library

3reactions
akxcommented, Sep 22, 2021

@Gagaro html5lib optionally requires chardet and likely behaves differently if it’s not installed.

https://github.com/html5lib/html5lib-python/blob/f7cab6f019ce94a1ec0192b6ff29aaebaf10b50d/requirements-optional.txt#L7-L9

Read more comments on GitHub >

github_iconTop Results From Across the Web

Charset Normalizer - Read the Docs
A Library that helps you read text from unknown charset encoding. This project is motivated by chardet, I'm trying to resolve the issue...
Read more >
charset-normalizer - PyPI
A library that helps you read text from an unknown charset encoding. Motivated by chardet , I'm trying to resolve the issue by...
Read more >
15: 9.4. String Functions and Operators - PostgreSQL
The optional form key word specifies the form: NFC (the default), NFD , NFKC , or NFKD . This expression can only be...
Read more >
RFC 7617: The 'Basic' HTTP Authentication Scheme
13 Appendix B. Deployment Considerations for the 'charset' Parameter . ... Section 2.2). o The authentication parameter 'charset' is OPTIONAL (see Section ...
Read more >
The Serializer Component (Symfony Docs)
When the AbstractObjectNormalizer::DEEP_OBJECT_TO_POPULATE option is set to true, ... They implement EncoderInterface for encoding (array to format) and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found