Don't load `hmtx` table until it's actually needed
See original GitHub issueThe hmtx table loads all glyph advanceWidth
and leftSideBearing
values, which is fine for small fonts but for large fonts this puts a lot of data in memory.
Instead of loading all data during hmtx.parseHmtxTable
, called from opentype.js:parseBuffer
, it would be much nicer to only load in a glyph’s values the first time those values are actually needed. The glyph prototype could be assigned default advanceWidth
and leftSideBearing
values with getter/setters, such that when they are requested, they then consult HMTX and bind the values onto the glyph instance, which then shadows the prototype values:
Object.define( Glyph.prototype, 'advanceWidth', {
configurable: true,
get: function() { this.setMetricsFromHMTX(); return this.advanceWidth; },
set: function(v) { this.advanceWidth = v; }
});
Before any values are bound, the prototype will be hit, but once the instance has these values bound, it won’t hit the prototype anymore. A demonstrator of the concept: http://jsbin.com/zopigokeho/edit?js
Issue Analytics
- State:
- Created 8 years ago
- Reactions:1
- Comments:8 (5 by maintainers)
Top GitHub Comments
Actually, glyph rendering definitely should not require parsing the full font file (no native font engine will do this, either). Loading entire fonts into memory will actually blow up memory instantly for large fonts, like the CJK fonts that implement 10,000 to 30,000 glyphs (which is pretty much any CJK font), but will even have noticeable detrimental behaviour for fonts like Times New Roman (3400 glyphs), Arial (3500 glyphs), or an open source font like FreeSans (5300 glyphs) or DejaVu (6000 glyphs)
The absolute last thing a font parser needs to do there is blindly load everything into memory, regardless of whether any of the glyphs and associated metadata are going to end up being used. Instead, they should have a fast traversal route for the font’s byte layout (which the spec is optimized for) to look up the data necessary for shaping code point sequences, and only cache things as they are being used, so that you have the lowest possible memory footprint, with the fastest possible shaping.
Uniscribe, DirectWrite, Freetype2, Harfbuzz, etc. all make sure to do this, because running through a font’s byte layout even “from disk” rather than memory mapped is fast (in may cases literally just following values as relative pointers), and makes it possible to do shaping with fonts that don’t even fit in memory.
JavaScript doesn’t have the luxury of running “from file”, so it has to create a memory map, and usually needs a prototype that operates on those maps, but creating a fully parsed object in memory uses up way more memory than is desirable: on mobile devices, but also desktop browsers and even servers where “more memory” costs “more money”, keeping memory use down is incredibly important: you don’t want a tab that uses 200MB just for a single font for which most glyphs aren’t actually ever going to be accessed, or so infrequently that caching them makes no sense.
Practical example: I couldn’t use opentype.js to generate some "multiple typeface images of Japanese ideographs) using 9 different CJK fonts, exactly because it was loading everything into memory instead of only selectively what was getting used. Instead of using maybe twice the size of the font file in memory, the object representation created by JavaScript used 2GB+ of memory (after the glyph lookup/caching PR, that’s gone down to about 320MB. Still a lot, but more manageable by far)
Actually, I think things are a bit more complex than just the two cases. If I’m only interested in drawing properly kerned text, there’s a lot of stuff that gets loaded in memory that I don’t care to retain. I can imagine a “low-memory mode” wherein we’re just trying to “carve” through the values as fast as possible, like cutting through the jungle with a machete. (e.g. in
head
parse justunitsPerEm
andindexToLocFormat
; inmaxp
parse justnumGlyphs
, …) This was an approach I saw in pdf.js, and I really like it. Also, Freetype does something similar where it just leaves values in the buffer unparsed and looks them up when needed (e.g. cmap).I’m not sure if all of these approaches can be consolidated in one library, but I think it would be interesting to have a “font info” and “full” mode, at least.