Make parsing XML from a file robustly easier, with a New-Xml cmdlet and the ability to cast file paths to [xml]
See original GitHub issueSummary of the new feature/enhancement
- Implement a
New-Xml
cmdlet that robustly parses XML from a given file and returns an[xml]
instance.
# WISHFUL THINKING:
# Should be the equivalent of:
# & { $xmlDom = [xml]::new(); $xmlDoc.Load((Convert-Path file.xml)); return $xmlDom }
New-Xml file.xml # -Path parameter implied.
Like Select-Xml
, a -Content
parameter should support an XML string as input.
- Complementarily, allow casts to
[xml]
to accept a singleGet-ChildItem
/Get-Item
output object in order to parse from a file (which would complement the existing ability to cast a string; e.g.,[xml] '<xml><foo/></xml>'
)
# WISHFUL THINKING:
# Should be the equivalent of the above.
[xml] (Get-Item file.xml)
Note: We could even accept path strings, given that it’s trivial to distinguish an XML string (which has to start with <
) from a file path.
Rationale:
Reading XML files into an [xml]
DOM (System.Xml.XmlDocument
) is a common use case, and the idiom frequently seen is:
$xmlDom = [xml] (Get-Content -Raw file.xml)
While concise and convenient, especially compared to the robust alternative, this approach is not robust, because it doesn’t respect character-encoding information that is part of the file itself, as part of the XML declaration - see this Stack Overflow answer for background.
As an aside: Select-Xml
currently has the same problem - see #14404.
To currently get robust behavior, use of .NET APIs is required, which is far from obvious and cumbersome:
# Robustly parse file 'file.xml' into an [xml] DOM
$xmlDom = [xml]::new(); $xmlDoc.Load((Convert-Path file.xml))
The proposed new features above would provide a PowerShell-idiomatic alternative that is both robust and convenient.
Quick-and-dirty New-Xml
prototype (for simplicity, a single parameter is used, and whether an XML string or a file path is given is derived from the specific value passed):
function New-Xml {
<#
.SYNOPSIS
Constructs an XML DOM ([xml] from an XML file or text.
#>
param([string] $PathOrText)
Set-StrictMode -Version 1; $ErrorActionPreference = 'Stop'
$doc = [xml]::new()
if ($PathOrText[0] -eq '<') { $doc.LoadXml($PathOrText) }
else { $doc.Load((Convert-Path $PathOrText)) }
return $doc
}
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:11 (6 by maintainers)
Top GitHub Comments
The sad reality is that the
[xml](Get-Content ...)
pattern is ubiquitous and there is no way fighting it, it just has been wrong for way too long.Given that, one of the options would be to implement encoding sniffing for XML files in
Get-Content
, i.e. recreate the same mechanism thatXmlDocument
uses, at least as long as the user has not expressed a preference using then-Encoding
parameter.It would make things transparently correct for anyone copying code off of the Internet and who’s not deep enough into the details of how XML implements file encodings. It’s not a clean solution, but a very pragmatic one. It also would help people who naively (or for performance reasons) process XML data line-wise as plain text.
All the XML cmdlets have the same problem as #14404, not only
Select-Xml
. Any XML document that is opened through the same internal function described over there suffers from this. Luckily that bug could be fixed without breaking existing interfaces (or scripts).