question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

XML Declaration ignored by DOMParser

See original GitHub issue

Basic info:

  • Node.js version: v10.15.3
  • jsdom version: 15.1.1

Minimal reproduction case

const { JSDOM } = require("jsdom");
const { XMLSerializer } = require("w3c-xmlserializer");

const inputXml = `<?xml version="1.0" encoding="ASCII"?><test>hello</test>`;
const options = { contentType: "application/xml" };
const dom = new JSDOM(inputXml, options);
const XMLSerializer_ctor = XMLSerializer.interface;
const serializer = new XMLSerializer_ctor();
const outputXml = serializer.serializeToString(dom.window.document);

console.log("inputXml:");
console.log(inputXml);
console.log("outputXml:");
console.log(outputXml); // Expected this to match inputXml.
  • I realize this includes w3c-xmlserializer, but I don’t see any other way to demonstrate the full process without it, since the deserialization is not done by jsdom itself.
  • I had initially logged a bug about this in the saxes project, but they seemed to think DOMParser needs to retrieve the XML Declaration details from xmlDecl. (saxes issue #16)
  • It sounds like I’m re-stating #415, but it did not actually address the original problem as described. It allowed Processing Instructions to be parsed, but not the actual XML Declaration, which was what the bug was about. Crucially, saxes does not emit the onprocessinginstruction event for the XML Declaration, just other Processing Instructions.

How does similar code behave in browsers?

Example in jsbin

  • Firefox produces an XML Declaration, but the encoding is changed to "UTF-8".
  • Chrome produces an XML Declaration that matches the original.
  • IE does not produce an XML Declaration.
  • Edge does not produce an XML Declaration.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Sebmastercommented, Jun 21, 2019

I would like to ensure that the encoding that I specify is correct.

I’ve been thinking about this. When you serialise the doc, you just get back a JS string, which (I think) means it should be encoding agnostic.

I think what you write into the XML declaration depends on how you write the file. If you use fs.writeFile and specify the string without an encoding, the file will always end up with utf-8 encoding.

However, it seems like jsdom always does the HTML encoding sniffing, even for XML docs. I’m not sure if that’s intended or if there’s a bug lurking there, but that could lead to double decodes 🤷‍♂ Definitely room to test that better there.

1reaction
domeniccommented, Jun 21, 2019

We have discussions about this in the context of HTML parsing/serialization all the time. In short, serialization/parsing are not meant to preserve the original form of the document. They are only preserving of the original information (i.e., the abstract stuff that survives into the parsed form). (And sometimes, not even that; see the warning and examples below the algorithm at https://html.spec.whatwg.org/#serialising-html-fragments). See https://github.com/inikulin/parse5/issues/261#issuecomment-401389295 for more.

As it currently stands, if I wanted to do the latter option myself, would it be safe to use dom.window.document.inputEncoding to detect the encoding that is in use?

I’m not sure, as I’m not sure what definition of “safety” you’re using. But see https://dom.spec.whatwg.org/#dom-document-inputencoding for the definition of inputEncoding.

Read more comments on GitHub >

github_iconTop Results From Across the Web

XML parsing with java dom parser converts xml declaration ...
I am using the "javax.xml.parsers.DocumentBuilderFactory" to create a dom parser. After parsing, the resultant document does not have People at ...
Read more >
DOM Parsing and Serialization - W3C
Abstract. This specification defines various APIs for programmatic access to HTML and generic XML parsers by web applications for use in parsing ...
Read more >
Parsing and serializing XML - Developer guides | MDN
Serializes DOM trees, converting them into strings containing XML. DOMParser. Constructs a DOM tree by parsing a string containing XML, ...
Read more >
XML Parser - IBM
If you are appending the XML to an existing XML then it is useful to check the No XML declaration when writing parameter....
Read more >
How to write XML file in Java – (DOM Parser) - Mkyong.com
We can configure the document.setXmlStandalone(true) to hide the XML declaration standalone="no" . TransformerFactory transformerFactory = ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found