Error message when comparing same string in NFC and NFD forms is not clear on Python 3
See original GitHub issueIf I compare same string in different Unicode normal forms the error message doesn’t show any differences between them when using Python 3. For example, if I have a test like
def test_nfc_nfd():
nfc = 'hyv\xe4'
nfd = 'hyva\u0308'
assert nfc == nfd
the result I got is
> assert nfc == nfd
E AssertionError: assert 'hyvä' == 'hyvä'
E - hyvä
E + hyvä
which isn’t very informative.
The problem is caused by repr('hyva\u0308')
being 'hyva\u0308'
which is rendered as 'hyvä'
. I submitted an issue about repr()
escaping combining characters (i.e. turning the result to 'hyva\\u0308'
) but it was closed as invalid.
In PyTest it would be possible to use ascii()
instead of repr()
, but that would make all non-ASCII strings unreadable and that would be worse. I guess the best solution would be using ascii()
if strings look the same, but I have no idea how to detect that.
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
Unicode HOWTO — Python 3.11.1 documentation
Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters.
Read more >Trouble reading string with non-ascii characters in python 3
It seems you could convert one name to another using unicodedata.normalize() with one of option NFC , NFKC , NFD , NFKD ....
Read more >UAX #15: Unicode Normalization Forms
Summary. This annex describes normalization forms for Unicode text. When implementations keep strings in a normalized form, they can be assured that ...
Read more >String.prototype.normalize() - JavaScript - MDN Web Docs
The normalize() method returns the Unicode Normalization Form of the ... are different, string comparison will not treat them as equal.
Read more >Difference Between NFD, NFC, NFKD, and NFKC Explained ...
Recently I am working on an NLP task in Japanese, one problem is to convert special characters to a normalized form.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Absolutely!
Hi @Zac-HD, I talked with you about this issue as part of pycon mentored sprint. Is it okay if I take over this issue?