question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

generated font file contains erroneous cmap entries resulting in blank ascii characters

See original GitHub issue

I’m using nanoemoji to build a COLR0 font from Twitter’s Twemoji. The generated font file contains cmap entries to blank glyphs for raw ascii numbers. This (probably) isn’t an issue with fontTools as I wrote a script using fontTools to fix it. My guess is that for the number emoji, it detects the initial unicode codepoint being a number, and adds an entry for it. the number emoji are the ascii numbers followed by U+20E3.

import glob
import subprocess

from fontTools.ttLib import TTFont

# this code generates the font file from twemoji
proc = subprocess.run(["nanoemoji",
                       "--family", "Twemoji Color Emoji",
                       "--color_format", "cff_colr_0",
                       "--version_major", "1",
                       "--version_minor", "1",
                       "--output_file", "TwemojiCOLR0.otf",
                       # only works on linux due to length limit
                       *(glob.glob("./twemoji/assets/svg/*"))
                       ])

# this code fixes it
if proc.returncode == 0:
    print("Fixing cmap...")
    font = TTFont("build/TwemojiCOLR0.otf")
    for i in range(len(font["cmap"].tables)):
        keys = list(font["cmap"].tables[i].cmap.keys())
        for k in keys:
            if k < 100:
                del font["cmap"].tables[i].cmap[k]

    font.save("TwemojiCOLR0.otf")

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
rsheetercommented, Sep 23, 2022

I actually don’t know what best way would be to make this work with font fallback chains.

In looking at the Android chain the solution appears to be putting it quite late in the chain plus, roughly, matching on the whole grapheme.

I believe that means there’s nothing to change in nanoemoji, please reopen if I misunderstood. I’m glad @anthrotype is here, I’d about convinced myself we had a bug.

1reaction
anthrotypecommented, Sep 23, 2022

Ok, I just made two test Twemoji fonts, one the same as yours, with all codepoints < 100 removed from the cmap table after build, and another font with the cmap untouched, thus containing explicit mappings for the ASCII numbers. You can find it in this zip file, which also includes an html file that compares both fonts side by side with a string that includes the KeyCaps emoji, i.e. #️⃣*️⃣0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣

Twemoji-COLRv0.zip

You can see in this screenshot below that the top line, with the original font built with nanoemoji, displays Twemoji’s keycap glyphs correctly, whereas the bottom line, with the font stripped of ASCII number cmappings, falls back to the system’s emoji (in my ChromeOS case this is NotoColorEmoji) because the browser (Chrome) can’t compose the keycap sequences using the Twemoji-COLRv0-no-colors.otf font, and thus resorts to the fallback system font:

Screenshot 2022-09-23 19 23 03

So your stripping those entries from cmap makes it impossible to compose those sequences. I think this is intended behavior and we don’t want to change that. I actually don’t know what best way would be to make this work with font fallback chains.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scripting functions — FontForge 20220308 documentation
Flattens a cid-keyed font, producing a font encoded with the result of the CMAP file. CIDSetFontNames(fontname[, family[, fullname ...
Read more >
CmapTools Known Issues - IHMC
Illegal unicode characters in Cmap files in CXL format prevented the Cmap from being imported or exported (fixed in v6.03).
Read more >
WOFF File Format 2.0 - W3C
An optional table containing the font fragment descriptions of font collection entries. CompressedFontData, Contents of font tables, compressed ...
Read more >
subset — fontTools Documentation - Read the Docs
pyftsubset is an OpenType font subsetter and optimizer, based on fontTools. ... Specify characters to include in the subset, as UTF-8 string. --text-file= ......
Read more >
cmap — Character to Glyph Index Mapping Table
If a font includes encoding records for Unicode subtables of the same format but with different platform IDs, an application may choose which...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found