Question: Which encoding is used when saving german umlauts to JSON
See original GitHub issueOverview
Hello,
I’m trying to save datapackages, resources and table schemas as JSON using the built in .to_json()
function. Problem is, that I have german umlauts (Ä,Ö,Ü,ß) and exponents (e.g. m², m³) in my meta data.
When opening the resulting JSON file in PyCharm, it tries to open that with UTF-8 encoding. This results in unrecognised characters and a warning from PyCharm, because the file seems to be encoded in ISO 8859-1 (see screenshot):
That’s how the file should look like:
Other editors (like Windows Notepad or Notepad++) recognise the encoding correctly.
My question is, when is fricitonless using UTF-8 and when other encodings? Why is it not saving in UTF-8 at all times and escaping unicode characters, since the JSON specification (RFC 7159, Chapter 8.1) specifies UTF-8 as standard encoding?
Thanks in advance and keep up the good work!
Python-Code to reproduce this issue:
import frictionless
pack = frictionless.Package()
pack.name = "name-of-package"
pack.description = "öäü ÖÄÜ ß m² m³ and some other text"
pack.to_json("datapackage_test.json")
Please preserve this line to notify @roll (lead of this repository)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:7 (3 by maintainers)
Top GitHub Comments
Great! Thanks for the quick analysis.
I’m going to fix it this week. If you’re interested feel free to PR adding a (failing -> fixed) test
Thanks for fixing this issue - works like a charm 👍 And BTW: your release rate is impressing - keep going!