question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Warn about late, or missing, <meta charset>

See original GitHub issue

If present, <meta charset=...> must occur within the first 1024 bytes of the HTML document per the HTML Standard: https://html.spec.whatwg.org/multipage/semantics.html#charset

Ideally, <meta charset> is the very first element within the <head>. This has been a best practice for a long time, e.g. recommended by HTML5 Boilerplate: https://github.com/h5bp/html5-boilerplate/blob/master/dist/doc/html.md#the-order-of-the-title-and-meta-tags

To guide developers towards adopting this best practice, Lighthouse could show a warning when <meta charset> is not the first element within <head> (document.head.firstElementChild).

Relevant links:

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:4
  • Comments:12 (4 by maintainers)

github_iconTop GitHub Comments

7reactions
hsivonencommented, Jan 17, 2020

Above the HTTP layer, it is as though the user pressed the reload button mid-way of the page. All work done until then is lost: The parser stops, the DOM and layout are torn down and things start over. I don’t know if or how the interaction with the HTTP cache differs from the case of the user pressing the reload button.

Starting over is so self-evidently a performance problem that I haven’t measured how bad it is exactly.

A realistic case to measure would be taking a product page for a Lego set on lego.com and measuring loading it in Firefox via a proxy as-is (as of today triggering a realistic late-<meta charset> reload) and then having the proxy add charset=utf-8 to the Content-Type header of the root resource (easier than making the proxy actually move the <meta charset>) and measuring again (without the reload).

For completeness, Firefox (as of 73) has three kinds of character encoding-related reloads that are implicitly triggered on non-file: URLs by non-conforming content (as opposed to user action):

  1. There is no HTTP-layer charset, there is no BOM, but there is a <meta charset> beyond the first 1024 bytes. The reload triggers once the HTML5 tree builder algorithm has processed the <meta charset> on the parser thread.
  2. We’re on a .jp domain, there is no HTTP-layer charset, there is no BOM, and there either isn’t a <meta charset> or before one there is either an ISO-2022-JP escape sequence or a pair of bytes that is invalid in Shift_JIS or decodes as half-width katakana in Shift_JIS and there hasn’t been prior bytes that would be invalid as EUC-JP or that would decode to half-width katakana in EUC-JP. The reload is triggered when the deciding byte is processed by the parser thread prior to tokenization.
  3. We’re on any TLD other than .jp, .in, or .lk, there’s no HTTP-layer charset, there is no BOM, there is no <meta charset> and at EOF the encoding guess made by looking at the TLD and the whole byte stream differs from the encoding guess made by looking at the TLD and the first 1024 bytes. The reload is triggered when the parser thread encounters the EOF from network. (Note that if UTF-8 is detected, the TLD-affiliated encoding is used instead for intentional misdecoding for the same reason why Chrome doesn’t detect UTF-8: To avoid Web authors starting to depend on this stuff. Hence, https://mathiasbynens.be/demo/missing-meta-charset decodes as windows-1252, since .be is a windows-1252-affiliated TLD.)

Which is to say that pages really should be specifying their encoding and do so within the first 1024 bytes.

1reaction
connorjclarkcommented, Jan 16, 2020

I meant I’d like to see data of how this optimization affects metrics. The main concern is that we don’t want to suggest low-wattage changes, and I’d like to be able to point towards something that says “this can increase first paint by x ms in these conditions”. Maybe I missed something like that in the links provided (on mobile, cant check right now).

Also, if we want this to be in the performance category as an opportunity, we need to understand the performance implications in order to simulate / come up with an estimated savings. Otherwise it’d have to be a diagnostic (no estimation given).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Charset Declaration Missing in HTML or System Runs Late
The <meta charset> element in the head section of a document. If this element is contained in the first 1024 bytes of the...
Read more >
Charset declaration is missing or occurs too late in the HTML
The character encoding declaration specification solves this problem. Theoretically, a late <meta charset> element (one that is not fully ...
Read more >
The character encoding of the plain text document was not ...
js script on FireFox browser. The warning message is. The character encoding of the plain text document was not declared. The document will...
Read more >
Charset UTF-8 HTML | Meta Charset Tag in HTML ... - YouTube
This video on Charset in HTML will take you through the concept of making a web page understand a character. A charset is...
Read more >
apt-get warning: No support for locale: en_US.utf8 - Ask Ubuntu
Usually this error means that you could have been changing between different languages (locales) and something has caused this to error erroneously.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found