question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make parsing XML from a file robustly easier, with a New-Xml cmdlet and the ability to cast file paths to [xml]

See original GitHub issue

Summary of the new feature/enhancement

  • Implement a New-Xml cmdlet that robustly parses XML from a given file and returns an [xml] instance.
# WISHFUL THINKING: 
# Should be the equivalent of:
#       & { $xmlDom = [xml]::new(); $xmlDoc.Load((Convert-Path file.xml)); return $xmlDom }
New-Xml file.xml   # -Path parameter implied.

Like Select-Xml, a -Content parameter should support an XML string as input.

  • Complementarily, allow casts to [xml] to accept a single Get-ChildItem / Get-Item output object in order to parse from a file (which would complement the existing ability to cast a string; e.g., [xml] '<xml><foo/></xml>')
# WISHFUL THINKING: 
# Should be the equivalent of the above.
[xml] (Get-Item file.xml)

Note: We could even accept path strings, given that it’s trivial to distinguish an XML string (which has to start with <) from a file path.


Rationale:

Reading XML files into an [xml] DOM (System.Xml.XmlDocument) is a common use case, and the idiom frequently seen is:

$xmlDom = [xml] (Get-Content -Raw file.xml)

While concise and convenient, especially compared to the robust alternative, this approach is not robust, because it doesn’t respect character-encoding information that is part of the file itself, as part of the XML declaration - see this Stack Overflow answer for background. As an aside: Select-Xml currently has the same problem - see #14404.

To currently get robust behavior, use of .NET APIs is required, which is far from obvious and cumbersome:

# Robustly parse file 'file.xml' into an [xml] DOM
$xmlDom = [xml]::new(); $xmlDoc.Load((Convert-Path file.xml))

The proposed new features above would provide a PowerShell-idiomatic alternative that is both robust and convenient.


Quick-and-dirty New-Xml prototype (for simplicity, a single parameter is used, and whether an XML string or a file path is given is derived from the specific value passed):

function New-Xml {
  <#
  .SYNOPSIS
  Constructs an XML DOM ([xml] from an XML file or text.
  #>
  param([string] $PathOrText)
  Set-StrictMode -Version 1; $ErrorActionPreference = 'Stop'
  $doc = [xml]::new()
  if ($PathOrText[0] -eq '<') { $doc.LoadXml($PathOrText) }
  else                        { $doc.Load((Convert-Path $PathOrText)) } 
  return $doc 
}

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:3
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
Tomalakcommented, Dec 28, 2020

The sad reality is that the [xml](Get-Content ...) pattern is ubiquitous and there is no way fighting it, it just has been wrong for way too long.

Given that, one of the options would be to implement encoding sniffing for XML files in Get-Content, i.e. recreate the same mechanism that XmlDocument uses, at least as long as the user has not expressed a preference using then -Encoding parameter.

It would make things transparently correct for anyone copying code off of the Internet and who’s not deep enough into the details of how XML implements file encodings. It’s not a clean solution, but a very pragmatic one. It also would help people who naively (or for performance reasons) process XML data line-wise as plain text.

2reactions
Tomalakcommented, Dec 28, 2020

All the XML cmdlets have the same problem as #14404, not only Select-Xml. Any XML document that is opened through the same internal function described over there suffers from this. Luckily that bug could be fixed without breaking existing interfaces (or scripts).

Read more comments on GitHub >

github_iconTop Results From Across the Web

The Magic of PowerShell to Parse XML, Read, and Validate
In this handy tutorial, learn how to use PowerShell to parse XML by reading XML and even creating an XML schema and validating...
Read more >
Strange characters found in XML file and PowerShell ...
[xml]$xmldata = (Get-Content $xmlpath) - as convenient as it is. The problem is indeed one of character encoding: your file is UTF-8-encoded, ...
Read more >
Python XML Parser Tutorial | ElementTree and Minidom Parsing
In this Python XML Parser Tutorial, you will learn how to parse, read, modify and find elements from XML files in Python using...
Read more >
ConvertTo-Xml (Microsoft.PowerShell.Utility)
The ConvertTo-Xml cmdlet creates an XML-based representation of one or more .NET objects. To use this cmdlet, pipe one or more objects to...
Read more >
A Roadmap to XML Parsers in Python
In this tutorial, you'll learn how to: Choose the right XML parsing model; Use the XML parsers in the standard library; Use major...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found