Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

generated font file contains erroneous cmap entries resulting in blank ascii characters

See original GitHub issue

I’m using nanoemoji to build a COLR0 font from Twitter’s Twemoji. The generated font file contains cmap entries to blank glyphs for raw ascii numbers. This (probably) isn’t an issue with fontTools as I wrote a script using fontTools to fix it. My guess is that for the number emoji, it detects the initial unicode codepoint being a number, and adds an entry for it. the number emoji are the ascii numbers followed by U+20E3.

import glob
import subprocess

from fontTools.ttLib import TTFont

# this code generates the font file from twemoji
proc = subprocess.run(["nanoemoji",
                       "--family", "Twemoji Color Emoji",
                       "--color_format", "cff_colr_0",
                       "--version_major", "1",
                       "--version_minor", "1",
                       "--output_file", "TwemojiCOLR0.otf",
                       # only works on linux due to length limit
                       *(glob.glob("./twemoji/assets/svg/*"))
                       ])

# this code fixes it
if proc.returncode == 0:
    print("Fixing cmap...")
    font = TTFont("build/TwemojiCOLR0.otf")
    for i in range(len(font["cmap"].tables)):
        keys = list(font["cmap"].tables[i].cmap.keys())
        for k in keys:
            if k < 100:
                del font["cmap"].tables[i].cmap[k]

    font.save("TwemojiCOLR0.otf")

Issue Analytics

State:
Created a year ago
Comments:10 (5 by maintainers)

Top GitHub Comments

1reaction

rsheetercommented, Sep 23, 2022

I actually don’t know what best way would be to make this work with font fallback chains.

In looking at the Android chain the solution appears to be putting it quite late in the chain plus, roughly, matching on the whole grapheme.

I believe that means there’s nothing to change in nanoemoji, please reopen if I misunderstood. I’m glad @anthrotype is here, I’d about convinced myself we had a bug.

1reaction

anthrotypecommented, Sep 23, 2022

Ok, I just made two test Twemoji fonts, one the same as yours, with all codepoints < 100 removed from the cmap table after build, and another font with the cmap untouched, thus containing explicit mappings for the ASCII numbers. You can find it in this zip file, which also includes an html file that compares both fonts side by side with a string that includes the KeyCaps emoji, i.e. #️⃣*️⃣0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣

Twemoji-COLRv0.zip

You can see in this screenshot below that the top line, with the original font built with nanoemoji, displays Twemoji’s keycap glyphs correctly, whereas the bottom line, with the font stripped of ASCII number cmappings, falls back to the system’s emoji (in my ChromeOS case this is NotoColorEmoji) because the browser (Chrome) can’t compose the keycap sequences using the Twemoji-COLRv0-no-colors.otf font, and thus resorts to the fallback system font:

Screenshot 2022-09-23 19 23 03

So your stripping those entries from cmap makes it impossible to compose those sequences. I think this is intended behavior and we don’t want to change that. I actually don’t know what best way would be to make this work with font fallback chains.

Top Results From Across the Web

Scripting functions — FontForge 20220308 documentation

Flattens a cid-keyed font, producing a font encoded with the result of the CMAP file. CIDSetFontNames(fontname[, family[, fullname ...

CmapTools Known Issues - IHMC

Illegal unicode characters in Cmap files in CXL format prevented the Cmap from being imported or exported (fixed in v6.03).

WOFF File Format 2.0 - W3C

An optional table containing the font fragment descriptions of font collection entries. CompressedFontData, Contents of font tables, compressed ...

subset — fontTools Documentation - Read the Docs

pyftsubset is an OpenType font subsetter and optimizer, based on fontTools. ... Specify characters to include in the subset, as UTF-8 string. --text-file= ......

cmap — Character to Glyph Index Mapping Table

If a font includes encoding records for Unicode subtables of the same format but with different platform IDs, an application may choose which...