question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Check 049: Limit to official whitespace characters?

See original GitHub issue

Observed behaviour

Check 049 treats these Unicode values as whitespace:

  WHITESPACE_CHARACTERS = [
      0x0009, 0x000A, 0x000B, 0x000C, 0x000D, 0x0020, 0x0085, 0x00A0, 0x1680,
      0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005, 0x2006, 0x2007, 0x2008,
      0x2009, 0x200A, 0x2028, 0x2029, 0x202F, 0x205F, 0x3000, 0x180E, 0x200B,
      0x2060, 0xFEFF
  ]

Expected behaviour

The Unicode 11.0 property list attached a whitespace property to these Unicode values:

0009..000D    ; White_Space # Cc   [5] <control-0009>..<control-000D>
0020          ; White_Space # Zs       SPACE
0085          ; White_Space # Cc       <control-0085>
00A0          ; White_Space # Zs       NO-BREAK SPACE
1680          ; White_Space # Zs       OGHAM SPACE MARK
2000..200A    ; White_Space # Zs  [11] EN QUAD..HAIR SPACE
2028          ; White_Space # Zl       LINE SEPARATOR
2029          ; White_Space # Zp       PARAGRAPH SEPARATOR
202F          ; White_Space # Zs       NARROW NO-BREAK SPACE
205F          ; White_Space # Zs       MEDIUM MATHEMATICAL SPACE
3000          ; White_Space # Zs       IDEOGRAPHIC SPACE

The difference isn’t large, but maybe we should only check for official whitespace characters? Can’t seem to access that information from Python’s unicodedata module though 😕

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
anthrotypecommented, Jun 6, 2018

maybe we can add to our fontTools.unicodedata module. we already can parse these kind of files https://github.com/fonttools/fonttools/blob/b38e2bd8acbebe98980f63a5bb490010e0c22134/MetaTools/buildUCD.py#L64

we use that to build the Scripts.txt, ScriptExtensions.txt, and Blocks.txt, which have the same format as PropList.txt.

PR? 😉

0reactions
drj11commented, Jul 7, 2020

This check is now com.google.fonts/check/whitespace_ink

The list includes whitespace that should have drawings, and also incudes non-whitespace that should not have drawings.

Some observations: The “extras” are U+180E MONGOLIAN VOWEL SEPARATOR U+200B ZERO-WIDTH SPACE U+2060 WORD JOINER U+FEFF ZERO WIDTH NO-BREAK SPACE (but Byte-Order-Mark in actual use)

These are not whitespace (in the properties sense) but should not have drawings (I’m making an assumption about MONGOLIAN VOWEL SEPARATOR because I don’t know about it). So they are appropriate for this test. It’s entirely possible there are other char codes like this, that are not whitespace, but have no drawing.

OGHAM SPACE MARK is the other way around. It is a whitespace character that is supposed to have a drawing (at least as far as I understand it, I don’t know much about the script).

This test is really about glyphs with no drawings, which may or not be whitespace.

In conclusion:

  • Some non-whitespace glyphs should not have drawings (and I think we could test for that, as we do now);
  • Some (one!) whitespace glyphs should have drawings. So we should remove OGHAM SPACE MARK from this list.
Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I trim leading/trailing whitespace in a standard way?
isspace helps to trim all white spaces. Run a first loop to check from last byte for space character and reduce the length...
Read more >
SSN Verification Service Handbook | Using SSNVS
The SSN Verification web page enables you to submit up to 10 names and SSNs for verification and obtain immediate results. There is...
Read more >
How To Use String Formatters in Python 3 - DigitalOcean
This tutorial will guide you through some of the common uses of string formatters in Python, which can help make your code and...
Read more >
Regular Expressions: Regexes in Python (Part 1)
Match based on whether a character represents whitespace. \s matches any whitespace character: >>> >>> re.search('\s', 'foo\nbar baz') <_sre.
Read more >
Extensible Markup Language (XML) 1.0 (Fifth Edition) - W3C
The design of XML shall be formal and concise. ... S (white space) consists of one or more space (#x20) characters, carriage returns, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found