Check 049: Limit to official whitespace characters?
See original GitHub issueObserved behaviour
Check 049 treats these Unicode values as whitespace:
WHITESPACE_CHARACTERS = [
0x0009, 0x000A, 0x000B, 0x000C, 0x000D, 0x0020, 0x0085, 0x00A0, 0x1680,
0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005, 0x2006, 0x2007, 0x2008,
0x2009, 0x200A, 0x2028, 0x2029, 0x202F, 0x205F, 0x3000, 0x180E, 0x200B,
0x2060, 0xFEFF
]
Expected behaviour
The Unicode 11.0 property list attached a whitespace property to these Unicode values:
0009..000D ; White_Space # Cc [5] <control-0009>..<control-000D>
0020 ; White_Space # Zs SPACE
0085 ; White_Space # Cc <control-0085>
00A0 ; White_Space # Zs NO-BREAK SPACE
1680 ; White_Space # Zs OGHAM SPACE MARK
2000..200A ; White_Space # Zs [11] EN QUAD..HAIR SPACE
2028 ; White_Space # Zl LINE SEPARATOR
2029 ; White_Space # Zp PARAGRAPH SEPARATOR
202F ; White_Space # Zs NARROW NO-BREAK SPACE
205F ; White_Space # Zs MEDIUM MATHEMATICAL SPACE
3000 ; White_Space # Zs IDEOGRAPHIC SPACE
The difference isn’t large, but maybe we should only check for official whitespace characters? Can’t seem to access that information from Python’s unicodedata
module though 😕
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
How do I trim leading/trailing whitespace in a standard way?
isspace helps to trim all white spaces. Run a first loop to check from last byte for space character and reduce the length...
Read more >SSN Verification Service Handbook | Using SSNVS
The SSN Verification web page enables you to submit up to 10 names and SSNs for verification and obtain immediate results. There is...
Read more >How To Use String Formatters in Python 3 - DigitalOcean
This tutorial will guide you through some of the common uses of string formatters in Python, which can help make your code and...
Read more >Regular Expressions: Regexes in Python (Part 1)
Match based on whether a character represents whitespace. \s matches any whitespace character: >>> >>> re.search('\s', 'foo\nbar baz') <_sre.
Read more >Extensible Markup Language (XML) 1.0 (Fifth Edition) - W3C
The design of XML shall be formal and concise. ... S (white space) consists of one or more space (#x20) characters, carriage returns, ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
maybe we can add to our fontTools.unicodedata module. we already can parse these kind of files https://github.com/fonttools/fonttools/blob/b38e2bd8acbebe98980f63a5bb490010e0c22134/MetaTools/buildUCD.py#L64
we use that to build the Scripts.txt, ScriptExtensions.txt, and Blocks.txt, which have the same format as PropList.txt.
PR? 😉
This check is now
com.google.fonts/check/whitespace_ink
The list includes whitespace that should have drawings, and also incudes non-whitespace that should not have drawings.
Some observations: The “extras” are U+180E MONGOLIAN VOWEL SEPARATOR U+200B ZERO-WIDTH SPACE U+2060 WORD JOINER U+FEFF ZERO WIDTH NO-BREAK SPACE (but Byte-Order-Mark in actual use)
These are not whitespace (in the properties sense) but should not have drawings (I’m making an assumption about MONGOLIAN VOWEL SEPARATOR because I don’t know about it). So they are appropriate for this test. It’s entirely possible there are other char codes like this, that are not whitespace, but have no drawing.
OGHAM SPACE MARK is the other way around. It is a whitespace character that is supposed to have a drawing (at least as far as I understand it, I don’t know much about the script).
This test is really about glyphs with no drawings, which may or not be whitespace.
In conclusion: