Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error message when comparing same string in NFC and NFD forms is not clear on Python 3

See original GitHub issue

If I compare same string in different Unicode normal forms the error message doesn’t show any differences between them when using Python 3. For example, if I have a test like

def test_nfc_nfd():
    nfc = 'hyv\xe4'
    nfd = 'hyva\u0308'
    assert nfc == nfd

the result I got is

>       assert nfc == nfd
E       AssertionError: assert 'hyvä' == 'hyvä'
E         - hyvä
E         + hyvä

which isn’t very informative.

The problem is caused by repr('hyva\u0308') being 'hyva\u0308' which is rendered as 'hyvä'. I submitted an issue about repr() escaping combining characters (i.e. turning the result to 'hyva\\u0308') but it was closed as invalid.

In PyTest it would be possible to use ascii() instead of repr(), but that would make all non-ASCII strings unreadable and that would be worse. I guess the best solution would be using ascii() if strings look the same, but I have no idea how to detect that.

Issue Analytics

State:
Created 5 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

Zac-HDcommented, Oct 6, 2022

Absolutely!

1reaction

dnstonecommented, May 16, 2021

Hi @Zac-HD, I talked with you about this issue as part of pycon mentored sprint. Is it okay if I take over this issue?

Top Results From Across the Web

Unicode HOWTO — Python 3.11.1 documentation

Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters.

Trouble reading string with non-ascii characters in python 3

It seems you could convert one name to another using unicodedata.normalize() with one of option NFC , NFKC , NFD , NFKD ....

UAX #15: Unicode Normalization Forms

Summary. This annex describes normalization forms for Unicode text. When implementations keep strings in a normalized form, they can be assured that ...

String.prototype.normalize() - JavaScript - MDN Web Docs

The normalize() method returns the Unicode Normalization Form of the ... are different, string comparison will not treat them as equal.

Difference Between NFD, NFC, NFKD, and NFKC Explained ...

Recently I am working on an NLP task in Japanese, one problem is to convert special characters to a normalized form.