Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cmap format 2: single byte of 1-byte character vs first byte of 2-byte characters

See original GitHub issue

I found a difference in subHeaderKeys[] of cmap subtable format 2, while modifying a font originally generated from makeotf. This subtable is for (legacy) MacJapanese, and built from 83pv-RKSJ-H CMap by makeotf. In the original font, subHeaderKeys[0xEF] to subHeaderKeys[0xFC] are all 376; while they are all 0 in the modified font. Please see my gist for details:

https://gist.github.com/mashabow/9eeb8ad5ba055582165f4cc0ccd6abf6

According to the OpenType spec, subHeaderKeys[] values follow this rule:

If subHeaderKeys[0xhh] == 0, 0xhh is a single byte of 1-byte character code
If subHeaderKeys[0xhh] > 0, 0xhh is a first byte of 2-byte character codes

Since0xEF to 0xFC are first bytes in MacJapanese, we can say the original font (makeotf) is correct and the modified font (FontTools) is wrong.

Currently, FontTools and TTX file don’t hold these subHeaderKeys[] data. So when there is no glyph mappings for 1-byte code 0xhh and for 2-byte codes 0xhh??, it is impossible to tell 0xhh is a single byte or a first byte.

Issue Analytics

State:
Created 6 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

readrobertscommented, Nov 29, 2017

I think there is not an issue here. I do see that starting with a cmap format 2 table made by makeotf (LogoCutStd-Bold.otf) dumping it with ttx, and then recompiling to ttx, the subHeaderKeys array comes out differently. However, the cmap format 2 allows different choices to be made about how to break up the subarrays in the glyphIndexArray, that are still functionally equivalent. Different segment and subarray ordering results in different values in the subHeaderKeys array. Using both ttx and ‘spot’, I see that the charcode to glyph mapping is identical for both fonts. I might have argued for changing fonttools to use the same segment and subarray ordering as makeotf for consistency, but the cmap subtable produced by fonttools is smaller that that produced by makeotf, so I would vote for changing makeotf. As a separate issue, one of mashabows’ concerns is that it is not possible to tell a one byte value from the start of a two byte value. This would be a problem if the ttx compiler had to decode a the glyph encoding from a charstring that encoded multiple glyphs, but the encoding elements of a ttx cmap format 2 table give the entire code point for a single glyph. 'map code="0xed40" name="cid00002"' is an entry for a two byte code, 'map code="0xed" name="cid000001"' is an entry for a one-byte code. There is no ambiguity.

0reactions

readrobertscommented, Nov 29, 2017

Happy to help. Sorry about not noticing that quoted XML elements disappear in the posts - I fixed this, and now the last two sentences should make sense.

Top Results From Across the Web

cmap — Character to Glyph Index Mapping Table

Format 2 : High-byte mapping through table. This subtable format was created for “double-byte” encodings following the national character code ...

Character to Glyph Mapping Table - Apple Developer

'cmap' format 6. Format 6 is used to map 16-bit, 2-byte, characters to glyph indexes. It is sometimes called the trimmed table mapping....

Introduction

CMap = character map; it converts code points in a code page to glyph IDs. The CMapTable is a table of CMaps (CMaps...

Built-in Types — Python 3.11.1 documentation

(Important exception: the Boolean operations or and and always return one of their ... The integer is represented using length bytes, and defaults...

Font building - Pomax

You can use a format 0 subtable for the first 256 characters, a format 4 subtable for the remaining two byte Unicode range,...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

cmap format 2: single byte of 1-byte character vs first byte of 2-byte characters

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

fontTools/misc/sstruct.py:75: DeprecationWarning: integer argument expected, got float

[varLib / ttLib] pen/GlyphSet API would be very handy for variable fonts