Warn about late, or missing, <meta charset>
See original GitHub issueIf present, <meta charset=...>
must occur within the first 1024 bytes of the HTML document per the HTML Standard: https://html.spec.whatwg.org/multipage/semantics.html#charset
Ideally, <meta charset>
is the very first element within the <head>
. This has been a best practice for a long time, e.g. recommended by HTML5 Boilerplate: https://github.com/h5bp/html5-boilerplate/blob/master/dist/doc/html.md#the-order-of-the-title-and-meta-tags
To guide developers towards adopting this best practice, Lighthouse could show a warning when <meta charset>
is not the first element within <head>
(document.head.firstElementChild
).
Relevant links:
- Twitter thread: https://twitter.com/hsivonen/status/1198618391042560000
- Chromium DevTools issue: https://bugs.chromium.org/p/chromium/issues/detail?id=1028041
Issue Analytics
- State:
- Created 4 years ago
- Reactions:4
- Comments:12 (4 by maintainers)
Top Results From Across the Web
Charset Declaration Missing in HTML or System Runs Late
The <meta charset> element in the head section of a document. If this element is contained in the first 1024 bytes of the...
Read more >Charset declaration is missing or occurs too late in the HTML
The character encoding declaration specification solves this problem. Theoretically, a late <meta charset> element (one that is not fully ...
Read more >The character encoding of the plain text document was not ...
js script on FireFox browser. The warning message is. The character encoding of the plain text document was not declared. The document will...
Read more >Charset UTF-8 HTML | Meta Charset Tag in HTML ... - YouTube
This video on Charset in HTML will take you through the concept of making a web page understand a character. A charset is...
Read more >apt-get warning: No support for locale: en_US.utf8 - Ask Ubuntu
Usually this error means that you could have been changing between different languages (locales) and something has caused this to error erroneously.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Above the HTTP layer, it is as though the user pressed the reload button mid-way of the page. All work done until then is lost: The parser stops, the DOM and layout are torn down and things start over. I don’t know if or how the interaction with the HTTP cache differs from the case of the user pressing the reload button.
Starting over is so self-evidently a performance problem that I haven’t measured how bad it is exactly.
A realistic case to measure would be taking a product page for a Lego set on lego.com and measuring loading it in Firefox via a proxy as-is (as of today triggering a realistic late-
<meta charset>
reload) and then having the proxy addcharset=utf-8
to theContent-Type
header of the root resource (easier than making the proxy actually move the<meta charset>
) and measuring again (without the reload).For completeness, Firefox (as of 73) has three kinds of character encoding-related reloads that are implicitly triggered on non-
file:
URLs by non-conforming content (as opposed to user action):charset
, there is no BOM, but there is a<meta charset>
beyond the first 1024 bytes. The reload triggers once the HTML5 tree builder algorithm has processed the<meta charset>
on the parser thread.charset
, there is no BOM, and there either isn’t a<meta charset>
or before one there is either an ISO-2022-JP escape sequence or a pair of bytes that is invalid in Shift_JIS or decodes as half-width katakana in Shift_JIS and there hasn’t been prior bytes that would be invalid as EUC-JP or that would decode to half-width katakana in EUC-JP. The reload is triggered when the deciding byte is processed by the parser thread prior to tokenization.charset
, there is no BOM, there is no<meta charset>
and at EOF the encoding guess made by looking at the TLD and the whole byte stream differs from the encoding guess made by looking at the TLD and the first 1024 bytes. The reload is triggered when the parser thread encounters the EOF from network. (Note that if UTF-8 is detected, the TLD-affiliated encoding is used instead for intentional misdecoding for the same reason why Chrome doesn’t detect UTF-8: To avoid Web authors starting to depend on this stuff. Hence, https://mathiasbynens.be/demo/missing-meta-charset decodes as windows-1252, since .be is a windows-1252-affiliated TLD.)Which is to say that pages really should be specifying their encoding and do so within the first 1024 bytes.
I meant I’d like to see data of how this optimization affects metrics. The main concern is that we don’t want to suggest low-wattage changes, and I’d like to be able to point towards something that says “this can increase first paint by x ms in these conditions”. Maybe I missed something like that in the links provided (on mobile, cant check right now).
Also, if we want this to be in the performance category as an opportunity, we need to understand the performance implications in order to simulate / come up with an estimated savings. Otherwise it’d have to be a diagnostic (no estimation given).