Text: font management
See original GitHub issueFont management is the process of detecting and selecting fonts for rendering text. I found this is too big a topic to completely fit in #305. Here I’ll try to summarize some things I learned and tried. And propose a plan.
Purpose
The dream goal is that users can use the gfx API to provide any text, and it will be rendered as expected, whether this is Latin, Cyrillic, Chinese or Egyptian Hieroglyphs. Being able to do this is within reach since the Noto font exists. This is an open font, commissioned by Google, that attempts to cover all script known to men.
One big obstacle is that shipping all Noto fonts would take significant amounts of memory. Therefore, we need to find a good subset that we make “Just Work”, and think about how to ship fonts for the more special and heavy scripts.
What I hope to achieve is that things just work for most users, and that for more special scripts, PyGfx shows a warning that a suitable font is not available, and instructions for how to obtain one. E.g. “Some characters in your text require NotoSansJp, install this font using …”.
A bit more on Noto
The Noto font is a family with over 100 fonts, organized mainly by script. The main font include Latic, Greek, Cyrillic, and punctuation etc. And then there is one font for Arabic, one for Traditional Chinese, one for Egyptian Hieroglyphs, etc.
There are projects that bundle these fonts in a smaller set of font files, e.g. https://github.com/satbyy/go-noto-universal This is not without consequences, e.g. the line height can become larger because the font file contains a font with larger ascenders. This, and because you still have multiple font files to manage, make me lean towards using the original Noto fonts.
The go-noto project mentioned above does define some categories, based on geographic region, and also based on whether a script is considered ancient (only for historic use). This can help us make reasonable subsets.
Default font and variations (serif, mono, bold, italic)
Many Noto fonts are available in both sans and serif. Quite a few are only in sans. A handful is only available in serif. The idea it to define a set of fonts using sans, and serif if sans is not available.
Most of this issue focuses on this default font to which we can always fall back. It would make sense, though, to also include e.g. NotoSansMono and NotoSerif.
Other variations on a font are weight (bold) and italic. Each such variation is another font file. E.g. to include bold and italic you’d need regular, bold, italic and bold-italic. Fortunately, we can render pretty good approximations of these variations (thanks to SDF). This means we can ship only the regular fonts. If a user wants the real thing, he can provide the necessary font himself. So when a bold font is requested, the font manager should try to find a matching bold font, and if that is not available, tell the shader to apply the approximation.
Insights in the required memory
(the color emoji font is exceptionally large, and I left it out. There is also a monochrome emoji font that is included).
- All Noto fonts is 47.5 MB.
- Compressed with gzip this becomes 34.0 MB.
- Compressed with lzma this becomes 28.2 MB.
Leaving out ancient fonts:
- Raw: 39.8 MB.
- Zipped (gzip): 30.2 MB
- Zipped (lmza): 25.0 MB
Leaving out CJK (Chinese, Japanese, Korean) and ancient fonts:
- Raw: 10.5 MB
- Zipped (gzip): 5.1 MB
- Zipped (lmza): 4.2 MB
On this wiki page the population per script is listed. If I score scripts based on this and the size of the font, I can come up with a somewhat motivated “lite subset”:
- NotoSans, the main font (latin, greek, cyrillic) - 556 KB
- NotoSansArabic - 177 KB
- NotoSansDevanagari - 191 KB
There are also some special fonts, some of which could be included in a lite subset:
- NotoEmoji 879 KB
- NotoSansMath 587 KB
- NotoSansSymbols - 203 KB
- NotoSansSymbols2 - 657 KB
- NotoMusic - 81 KB
And there is:
- NotoSansMono - 343 KB
- NotoSerif - 376 KB
Technical stuff
I have a local script that uses the Google fonts api to get a list of all available fonts. From there it selects the Noto fonts and then the fonts that make up our default set. All the font files can then be downloaded. They can also be categorized as mentioned above.
Using freetype it is easy to check what Unicode code points are provided by a font. We can use this to create an index. This way, when we want to render text, we can check for each character from what font file it must be loaded. In #305 I demonstrate how we can render Latin, Arabic and emoji in one text object.
The idea is to create an index for the default (a.k.a. fallback) font: it maps a codepoint to a font. Further, for each font we keep a set (or set-like data structure) that can be queried to check whether a codepoint is in a certain font. That way, a user can provide one or more preferred font families, and the font manager can select the fonts that support the characters, falling back to our NotoSans if needed. The basics of this is implemented in #305.
We can also scan the system fonts (not currently implemented), and users can register their own font.
The plan
Ship a subset by default. Either with PyGfx, or a new package specific to fonts. This could be the “lite” subset, or perhaps all fonts excluding CJK and ancient fonts. Size matters, not only for our end-users, but also because it would be installed e.g. on each CI job that uses pygfx.
For the other fonts, I am not sure yet. I came up with a few options.
Plan A
Create a package for CJK fonts and another for ancient fonts. People can just pip-install them.
If you have multiple Python environments, you need to install the package multiple times. This can add up if you’re a Chinese studying Egyptian Hyroglifics 😉 We could reduce this effect a bit by shipping the fonts in an lzma-compressed zipfile, and unpacking them in the appdata directory (shared between environments).
Plan B
The package that supplies the basic fonts could have a command to download more fonts. E.g. using python -m fontmanager download NotoSansJp
. Or in a GUI application the application could support installing more fonts with a click of a button. Auto-downloading is probably a bad idea (firewalls and all that).
Plan C
We could add a page to the docs that has links to all the font files of interest. We can even put an anchor at each link, so the warning message can produce a link to exactly the file that needs to be downloaded. Once the ttf/otf file is downloaded, on most OS’s you can just open it and click “install”.
This needs good support for detecting system fonts, and also detecting when they need to be re-indexed.
Plan B and C don’t exclude each-other. Actually, neither does plan A.
Issue Analytics
- State:
- Created 10 months ago
- Comments:10
Top GitHub Comments
Things are working as intended now! An illustrated story:
👎 😢
👍 😃
Closes by #305