question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cmap format 2: single byte of 1-byte character vs first byte of 2-byte characters

See original GitHub issue

I found a difference in subHeaderKeys[] of cmap subtable format 2, while modifying a font originally generated from makeotf. This subtable is for (legacy) MacJapanese, and built from 83pv-RKSJ-H CMap by makeotf. In the original font, subHeaderKeys[0xEF] to subHeaderKeys[0xFC] are all 376; while they are all 0 in the modified font. Please see my gist for details:

According to the OpenType spec, subHeaderKeys[] values follow this rule:

  • If subHeaderKeys[0xhh] == 0, 0xhh is a single byte of 1-byte character code
  • If subHeaderKeys[0xhh] > 0, 0xhh is a first byte of 2-byte character codes

Since0xEF to 0xFC are first bytes in MacJapanese, we can say the original font (makeotf) is correct and the modified font (FontTools) is wrong.

Currently, FontTools and TTX file don’t hold these subHeaderKeys[] data. So when there is no glyph mappings for 1-byte code 0xhh and for 2-byte codes 0xhh??, it is impossible to tell 0xhh is a single byte or a first byte.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
readrobertscommented, Nov 29, 2017

I think there is not an issue here. I do see that starting with a cmap format 2 table made by makeotf (LogoCutStd-Bold.otf) dumping it with ttx, and then recompiling to ttx, the subHeaderKeys array comes out differently. However, the cmap format 2 allows different choices to be made about how to break up the subarrays in the glyphIndexArray, that are still functionally equivalent. Different segment and subarray ordering results in different values in the subHeaderKeys array. Using both ttx and ‘spot’, I see that the charcode to glyph mapping is identical for both fonts. I might have argued for changing fonttools to use the same segment and subarray ordering as makeotf for consistency, but the cmap subtable produced by fonttools is smaller that that produced by makeotf, so I would vote for changing makeotf. As a separate issue, one of mashabows’ concerns is that it is not possible to tell a one byte value from the start of a two byte value. This would be a problem if the ttx compiler had to decode a the glyph encoding from a charstring that encoded multiple glyphs, but the encoding elements of a ttx cmap format 2 table give the entire code point for a single glyph. 'map code="0xed40" name="cid00002"' is an entry for a two byte code, 'map code="0xed" name="cid000001"' is an entry for a one-byte code. There is no ambiguity.

0reactions
readrobertscommented, Nov 29, 2017

Happy to help. Sorry about not noticing that quoted XML elements disappear in the posts - I fixed this, and now the last two sentences should make sense.

Read more comments on GitHub >

github_iconTop Results From Across the Web

cmap — Character to Glyph Index Mapping Table
Format 2 : High-byte mapping through table. This subtable format was created for “double-byte” encodings following the national character code ...
Read more >
Character to Glyph Mapping Table - Apple Developer
'cmap' format 6. Format 6 is used to map 16-bit, 2-byte, characters to glyph indexes. It is sometimes called the trimmed table mapping....
Read more >
Introduction
CMap = character map; it converts code points in a code page to glyph IDs. The CMapTable is a table of CMaps (CMaps...
Read more >
Built-in Types — Python 3.11.1 documentation
(Important exception: the Boolean operations or and and always return one of their ... The integer is represented using length bytes, and defaults...
Read more >
Font building - Pomax
You can use a format 0 subtable for the first 256 characters, a format 4 subtable for the remaining two byte Unicode range,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found