question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error message when comparing same string in NFC and NFD forms is not clear on Python 3

See original GitHub issue

If I compare same string in different Unicode normal forms the error message doesn’t show any differences between them when using Python 3. For example, if I have a test like

def test_nfc_nfd():
    nfc = 'hyv\xe4'
    nfd = 'hyva\u0308'
    assert nfc == nfd

the result I got is

>       assert nfc == nfd
E       AssertionError: assert 'hyvä' == 'hyvä'
E         - hyvä
E         + hyvä

which isn’t very informative.

The problem is caused by repr('hyva\u0308') being 'hyva\u0308' which is rendered as 'hyvä'. I submitted an issue about repr() escaping combining characters (i.e. turning the result to 'hyva\\u0308') but it was closed as invalid.

In PyTest it would be possible to use ascii() instead of repr(), but that would make all non-ASCII strings unreadable and that would be worse. I guess the best solution would be using ascii() if strings look the same, but I have no idea how to detect that.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Zac-HDcommented, Oct 6, 2022

Absolutely!

1reaction
dnstonecommented, May 16, 2021

Hi @Zac-HD, I talked with you about this issue as part of pycon mentored sprint. Is it okay if I take over this issue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unicode HOWTO — Python 3.11.1 documentation
Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters.
Read more >
Trouble reading string with non-ascii characters in python 3
It seems you could convert one name to another using unicodedata.normalize() with one of option NFC , NFKC , NFD , NFKD ....
Read more >
UAX #15: Unicode Normalization Forms
Summary. This annex describes normalization forms for Unicode text. When implementations keep strings in a normalized form, they can be assured that ...
Read more >
String.prototype.normalize() - JavaScript - MDN Web Docs
The normalize() method returns the Unicode Normalization Form of the ... are different, string comparison will not treat them as equal.
Read more >
Difference Between NFD, NFC, NFKD, and NFKC Explained ...
Recently I am working on an NLP task in Japanese, one problem is to convert special characters to a normalized form.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found