Making fontmake faster
See original GitHub issueWe want to drastically speed up fontmake: https://github.com/googlei18n/fontmake/issues/367
Other than switching booleanOperations to something vector-based and fast (https://github.com/typemytype/booleanOperations/issues/40), the other big offenders are basically 1. MutatorMath / fontMath, and general defcon / read/write UFO. For example, when James replaces defcon use with tiny objects, he saw about 20% speedup:
https://github.com/googlei18n/ufo2ft/commit/2e65007df653515762ec95888f9c29b5137dec28
Cosimo has also reports that using lxml
module instead of xml
can significantly speed up UFO reading.
Since https://github.com/unified-font-object/ufoLib is essentially a reference implementation, I like to propose keeping it that way, with validators and converters, but add a trimmed-down version in fontTools.ufoLib that is optimized for speed. I suggest the API for this be strictly a subset of the upstream ufoLib, such that users can use one or the other based on their need.
I also like to add fs
module support to it, as we need that for build-system experimentations. Which brings us to the topic of the UFO4 branch: https://github.com/unified-font-object/ufoLib/tree/ufo4 In general, I suggest we avoid these big revolutionary major version bumps that take forever to roll out and cause dependecy hell and instead work on an evolutionary UFO that changes slowly over time in master branch.
Please discuss.
@typesupply @typemytype @anthrotype @justvanrossum @LettError @jenskutilek @madig @brawer
Issue Analytics
- State:
- Created 6 years ago
- Comments:251 (136 by maintainers)
Top GitHub Comments
Great discussion. I have input on various aspects. Different goals justify different actions at this point. I’ll try to group them logically:
Main reason why fontmake is slow is this: it was developed with “correctness first” in mind with limited resources. In my view it was going to be rewritten at some point.
Based on Simon’s flamegraph in https://github.com/fonttools/fonttools/issues/1095#issuecomment-797755327, below I identify low-hanging fruit that can be sped up with relative ease in the existing fontmake, while addressing the “to Python or not Python” and “Babelfont’s 5x faster” questions implicitly. In a subsequent comment I’ll share some ideas that I had back then for what a rewrite might look like.
For the sake of discussion, I like to name three different use-cases for fontmake:
The rest of this comment is mostly involved with speeding up narrow builds, as in “make me a variable font”.
Low-hanging fruit
For reference, here’s the flame-graph:
I think we can make it 2x faster by trimming fat from the following:
The middle of the graph (the wide column with
_writeTable
etc) suggests that individual master binaries are generated; indeed, two columns to its right, the mention of “expand” suggests that then the binaries are being loaded again. That’s unnecessary. We just need to build Python TTFont but no need to compile it before passing to varLib to make a varfont. Removing that unnecessary save (and fixing up any breakage from it) should save at least 1 babelfont of work.Another ~1 babelfont of time is spend in
iup_delta_optimize
. There are multiple lessons here:iup_delta_optimize
is one such algorithm. It’s a simple dynamic-programming. The runtime is well-understood. It would be probably over 1000x faster if done in C.iup_delta_optimize
has two parts: simple cases are handled first (entire contour shifted, etc; no actual interpolation involved); then the full algorithm. The full algorithm’s gains are extremely tiny: maybe a dozen bytes in a mid-sized font. So perhaps if it’s very slow and gains are very small, maybe it shouldn’t be enabled by default. If we had optimization levels like compilers do, this should be enabled by-O3
or-Os
/-Oz
, but not-O2
which would be the recommended default.There’s no point trying to speed up
gvar
generation using fancy facilities. Thegvar
is already generated very optimally since entireGlyphCoordinates
objects are passed to VariationModel. Now, theGlyphCoordinates
uses anarray.array
for storage; this was done when I removed NumPy. As a result, the__add__
/__mul__
etc of it are implemented in Python. They are still fast, but making those happen in C would give us significant speedup ingvar
generation. Try Cythonizing those? Ideally maybe share this code / work withfontTools.misc.vector
.Currently
OTTableWriter
keeps a list of “items” that are concatenated at the end. The items are typicallybytes()
objects that are 2 or 4 bytes long. That’s insane overhead to use a Python object to store 2 bytes. Now I’m thinking that switching OTTableWriter to use abytearray
internally as a bytes-builder makes more sense. The offsets and other calculated ints need more bookkeeping to know their position inside the table; that can be done easily. I have a feeling that this can make GSUB/GPOS table compilation many times faster.Tangent
The GSUB/GPOS compiler in fonttools is REALLY slow if overflows happen. I started working on fixing that in the 99proof branch back in 2018:
https://github.com/fonttools/fonttools/tree/99proof
I talked about it extensively at Robothon 2018, the so-called “99 Proof Smal Batch Distillery”:
https://vimeo.com/330981972
But never finished it.
Later in 2018 / 2019, I wrote a design document for the HarfBuzz Subsetter, building on top of those ideas but suggesting a different approach, called “Faster Horse Freezer”:
https://goo.gl/bHvnTn
but I never implemented that. Fortunately, @garretrieger implemented most of the reordering ideas from 99proof into a
hb-repacker
module, which we are landing in HarfBuzz today:https://github.com/harfbuzz/harfbuzz/pull/2857
The code is very isolated from the rest of HarfBuzz Subsetter. It basically takes a graph of objects and links, and tries to produce an ordering that wouldn’t overflow. It shouldn’t be too much work to pass the
OTTableWriter
graph to C and call the repacker on it…Incidentally, I tried an experiment wrapping the Rust norad library in Python. It’s quite a speedup.