Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Warn about late, or missing, <meta charset>

See original GitHub issue

If present, <meta charset=...> must occur within the first 1024 bytes of the HTML document per the HTML Standard: https://html.spec.whatwg.org/multipage/semantics.html#charset

Ideally, <meta charset> is the very first element within the <head>. This has been a best practice for a long time, e.g. recommended by HTML5 Boilerplate: https://github.com/h5bp/html5-boilerplate/blob/master/dist/doc/html.md#the-order-of-the-title-and-meta-tags

To guide developers towards adopting this best practice, Lighthouse could show a warning when <meta charset> is not the first element within <head> (document.head.firstElementChild).

Relevant links:

Twitter thread: https://twitter.com/hsivonen/status/1198618391042560000
Chromium DevTools issue: https://bugs.chromium.org/p/chromium/issues/detail?id=1028041

Issue Analytics

State:
Created 4 years ago
Reactions:4
Comments:12 (4 by maintainers)

Top GitHub Comments

7reactions

hsivonencommented, Jan 17, 2020

Above the HTTP layer, it is as though the user pressed the reload button mid-way of the page. All work done until then is lost: The parser stops, the DOM and layout are torn down and things start over. I don’t know if or how the interaction with the HTTP cache differs from the case of the user pressing the reload button.

Starting over is so self-evidently a performance problem that I haven’t measured how bad it is exactly.

A realistic case to measure would be taking a product page for a Lego set on lego.com and measuring loading it in Firefox via a proxy as-is (as of today triggering a realistic late-<meta charset> reload) and then having the proxy add charset=utf-8 to the Content-Type header of the root resource (easier than making the proxy actually move the <meta charset>) and measuring again (without the reload).

For completeness, Firefox (as of 73) has three kinds of character encoding-related reloads that are implicitly triggered on non-file: URLs by non-conforming content (as opposed to user action):

There is no HTTP-layer charset, there is no BOM, but there is a <meta charset> beyond the first 1024 bytes. The reload triggers once the HTML5 tree builder algorithm has processed the <meta charset> on the parser thread.
We’re on a .jp domain, there is no HTTP-layer charset, there is no BOM, and there either isn’t a <meta charset> or before one there is either an ISO-2022-JP escape sequence or a pair of bytes that is invalid in Shift_JIS or decodes as half-width katakana in Shift_JIS and there hasn’t been prior bytes that would be invalid as EUC-JP or that would decode to half-width katakana in EUC-JP. The reload is triggered when the deciding byte is processed by the parser thread prior to tokenization.
We’re on any TLD other than .jp, .in, or .lk, there’s no HTTP-layer charset, there is no BOM, there is no <meta charset> and at EOF the encoding guess made by looking at the TLD and the whole byte stream differs from the encoding guess made by looking at the TLD and the first 1024 bytes. The reload is triggered when the parser thread encounters the EOF from network. (Note that if UTF-8 is detected, the TLD-affiliated encoding is used instead for intentional misdecoding for the same reason why Chrome doesn’t detect UTF-8: To avoid Web authors starting to depend on this stuff. Hence, https://mathiasbynens.be/demo/missing-meta-charset decodes as windows-1252, since .be is a windows-1252-affiliated TLD.)

Which is to say that pages really should be specifying their encoding and do so within the first 1024 bytes.

1reaction

connorjclarkcommented, Jan 16, 2020

I meant I’d like to see data of how this optimization affects metrics. The main concern is that we don’t want to suggest low-wattage changes, and I’d like to be able to point towards something that says “this can increase first paint by x ms in these conditions”. Maybe I missed something like that in the links provided (on mobile, cant check right now).

Also, if we want this to be in the performance category as an opportunity, we need to understand the performance implications in order to simulate / come up with an estimated savings. Otherwise it’d have to be a diagnostic (no estimation given).