XML Declaration ignored by DOMParser
See original GitHub issueBasic info:
- Node.js version: v10.15.3
- jsdom version: 15.1.1
Minimal reproduction case
const { JSDOM } = require("jsdom");
const { XMLSerializer } = require("w3c-xmlserializer");
const inputXml = `<?xml version="1.0" encoding="ASCII"?><test>hello</test>`;
const options = { contentType: "application/xml" };
const dom = new JSDOM(inputXml, options);
const XMLSerializer_ctor = XMLSerializer.interface;
const serializer = new XMLSerializer_ctor();
const outputXml = serializer.serializeToString(dom.window.document);
console.log("inputXml:");
console.log(inputXml);
console.log("outputXml:");
console.log(outputXml); // Expected this to match inputXml.
- I realize this includes w3c-xmlserializer, but I don’t see any other way to demonstrate the full process without it, since the deserialization is not done by jsdom itself.
- I had initially logged a bug about this in the saxes project, but they seemed to think
DOMParser
needs to retrieve the XML Declaration details fromxmlDecl
. (saxes issue #16) - It sounds like I’m re-stating #415, but it did not actually address the original problem as described. It allowed Processing Instructions to be parsed, but not the actual XML Declaration, which was what the bug was about. Crucially, saxes does not emit the
onprocessinginstruction
event for the XML Declaration, just other Processing Instructions.
How does similar code behave in browsers?
- Firefox produces an XML Declaration, but the encoding is changed to
"UTF-8"
. - Chrome produces an XML Declaration that matches the original.
- IE does not produce an XML Declaration.
- Edge does not produce an XML Declaration.
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (5 by maintainers)
Top Results From Across the Web
XML parsing with java dom parser converts xml declaration ...
I am using the "javax.xml.parsers.DocumentBuilderFactory" to create a dom parser. After parsing, the resultant document does not have People at ...
Read more >DOM Parsing and Serialization - W3C
Abstract. This specification defines various APIs for programmatic access to HTML and generic XML parsers by web applications for use in parsing ...
Read more >Parsing and serializing XML - Developer guides | MDN
Serializes DOM trees, converting them into strings containing XML. DOMParser. Constructs a DOM tree by parsing a string containing XML, ...
Read more >XML Parser - IBM
If you are appending the XML to an existing XML then it is useful to check the No XML declaration when writing parameter....
Read more >How to write XML file in Java – (DOM Parser) - Mkyong.com
We can configure the document.setXmlStandalone(true) to hide the XML declaration standalone="no" . TransformerFactory transformerFactory = ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’ve been thinking about this. When you serialise the doc, you just get back a JS string, which (I think) means it should be encoding agnostic.
I think what you write into the XML declaration depends on how you write the file. If you use
fs.writeFile
and specify the string without an encoding, the file will always end up with utf-8 encoding.However, it seems like jsdom always does the HTML encoding sniffing, even for XML docs. I’m not sure if that’s intended or if there’s a bug lurking there, but that could lead to double decodes 🤷♂ Definitely room to test that better there.
We have discussions about this in the context of HTML parsing/serialization all the time. In short, serialization/parsing are not meant to preserve the original form of the document. They are only preserving of the original information (i.e., the abstract stuff that survives into the parsed form). (And sometimes, not even that; see the warning and examples below the algorithm at https://html.spec.whatwg.org/#serialising-html-fragments). See https://github.com/inikulin/parse5/issues/261#issuecomment-401389295 for more.
I’m not sure, as I’m not sure what definition of “safety” you’re using. But see https://dom.spec.whatwg.org/#dom-document-inputencoding for the definition of inputEncoding.