Flatten XML Parser?
See original GitHub issueXML that makes use of a lot of attributes, rather than elements, results in JSON that is hard to work with using Ansible / JMESPath, for example nmap
has XML output (but not JSON):
nmap -oX - -p 443 galaxy.ansible.com | xmllint --pretty 1 -
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE nmaprun>
<?xml-stylesheet href="file:///usr/bin/../share/nmap/nmap.xsl" type="text/xsl"?>
<!-- Nmap 7.92 scan initiated Wed Oct 26 11:51:38 2022 as: nmap -oX - -p 443 galaxy.ansible.com -->
<nmaprun scanner="nmap" args="nmap -oX - -p 443 galaxy.ansible.com" start="1666781498" startstr="Wed Oct 26 11:51:38 2022" version="7.92" xmloutputversion="1.05">
<scaninfo type="connect" protocol="tcp" numservices="1" services="443"/>
<verbose level="0"/>
<debugging level="0"/>
<hosthint>
<status state="up" reason="unknown-response" reason_ttl="0"/>
<address addr="172.67.68.251" addrtype="ipv4"/>
<hostnames>
<hostname name="galaxy.ansible.com" type="user"/>
</hostnames>
</hosthint>
<host starttime="1666781498" endtime="1666781498">
<status state="up" reason="syn-ack" reason_ttl="0"/>
<address addr="172.67.68.251" addrtype="ipv4"/>
<hostnames>
<hostname name="galaxy.ansible.com" type="user"/>
<hostname name="galaxy.ansible.com" type="PTR"/>
</hostnames>
<ports>
<port protocol="tcp" portid="443">
<state state="open" reason="syn-ack" reason_ttl="0"/>
<service name="https" method="table" conf="3"/>
</port>
</ports>
<times srtt="12260" rttvar="9678" to="100000"/>
</host>
<runstats>
<finished time="1666781498" timestr="Wed Oct 26 11:51:38 2022" summary="Nmap done at Wed Oct 26 11:51:38 2022; 1 IP address (1 host up) scanned in 0.10 seconds" elapsed="0.10" exit="success"/>
<hosts up="1" down="0" total="1"/>
</runstats>
</nmaprun>
Convert this into JSON / YAML and the results are not great…
nmap -oX - -p 443 galaxy.ansible.com | xmllint --pretty 1 - | jc --xml -py
---
nmaprun:
'@scanner': nmap
'@args': nmap -oX - -p 443 galaxy.ansible.com
'@start': '1666781628'
'@startstr': Wed Oct 26 11:53:48 2022
'@version': '7.92'
'@xmloutputversion': '1.05'
scaninfo:
'@type': connect
'@protocol': tcp
'@numservices': '1'
'@services': '443'
verbose:
'@level': '0'
debugging:
'@level': '0'
hosthint:
status:
'@state': up
'@reason': unknown-response
'@reason_ttl': '0'
address:
'@addr': 172.67.68.251
'@addrtype': ipv4
hostnames:
hostname:
'@name': galaxy.ansible.com
'@type': user
host:
'@starttime': '1666781628'
'@endtime': '1666781628'
status:
'@state': up
'@reason': syn-ack
'@reason_ttl': '0'
address:
'@addr': 172.67.68.251
'@addrtype': ipv4
hostnames:
hostname:
- '@name': galaxy.ansible.com
'@type': user
- '@name': galaxy.ansible.com
'@type': PTR
ports:
port:
'@protocol': tcp
'@portid': '443'
state:
'@state': open
'@reason': syn-ack
'@reason_ttl': '0'
service:
'@name': https
'@method': table
'@conf': '3'
times:
'@srtt': '13479'
'@rttvar': '11398'
'@to': '100000'
runstats:
finished:
'@time': '1666781628'
'@timestr': Wed Oct 26 11:53:48 2022
'@summary': Nmap done at Wed Oct 26 11:53:48 2022; 1 IP address (1 host up)
scanned in 0.10 seconds
'@elapsed': '0.10'
'@exit': success
hosts:
'@up': '1'
'@down': '0'
'@total': '1'
However if the XML is flattened using XSLT first:
nmap -oX - -p 443 galaxy.ansible.com | xmllint --pretty 1 - > galaxy.ansible.com.xml
xsltproc attributes2elements.xslt galaxy.ansible.com.xml
<?xml version="1.0"?>
<?xml-stylesheet href="file:///usr/bin/../share/nmap/nmap.xsl" type="text/xsl"?><!-- Nmap 7.92 scan initiated Wed Oct 26 12:01:56 2022 as: nmap -oX - -p 443 galaxy.ansible.com -->
<nmaprun><scanner>nmap</scanner><args>nmap -oX - -p 443 galaxy.ansible.com</args><start>1666782116</start><startstr>Wed Oct 26 12:01:56 2022</startstr><version>7.92</version><xmloutputversion>1.05</xmloutputversion>
<scaninfo><type>connect</type><protocol>tcp</protocol><numservices>1</numservices><services>443</services></scaninfo>
<verbose><level>0</level></verbose>
<debugging><level>0</level></debugging>
<hosthint>
<status><state>up</state><reason>unknown-response</reason><reason_ttl>0</reason_ttl></status>
<address><addr>172.67.68.251</addr><addrtype>ipv4</addrtype></address>
<hostnames>
<hostname><name>galaxy.ansible.com</name><type>user</type></hostname>
</hostnames>
</hosthint>
<host><starttime>1666782116</starttime><endtime>1666782116</endtime>
<status><state>up</state><reason>syn-ack</reason><reason_ttl>0</reason_ttl></status>
<address><addr>172.67.68.251</addr><addrtype>ipv4</addrtype></address>
<hostnames>
<hostname><name>galaxy.ansible.com</name><type>user</type></hostname>
<hostname><name>galaxy.ansible.com</name><type>PTR</type></hostname>
</hostnames>
<ports>
<port><protocol>tcp</protocol><portid>443</portid>
<state><state>open</state><reason>syn-ack</reason><reason_ttl>0</reason_ttl></state>
<service><name>https</name><method>table</method><conf>3</conf></service>
</port>
</ports>
<times><srtt>10773</srtt><rttvar>8291</rttvar><to>100000</to></times>
</host>
<runstats>
<finished><time>1666782116</time><timestr>Wed Oct 26 12:01:56 2022</timestr><summary>Nmap done at Wed Oct 26 12:01:56 2022; 1 IP address (1 host up) scanned in 0.10 seconds</summary><elapsed>0.10</elapsed><exit>success</exit></finished>
<hosts><up>1</up><down>0</down><total>1</total></hosts>
</runstats>
</nmaprun>
You then have something that is nicer to work with:
xsltproc attributes2elements.xslt galaxy.ansible.com.xml | jc --xml -py
---
nmaprun:
scanner: nmap
args: nmap -oX - -p 443 galaxy.ansible.com
start: '1666782116'
startstr: Wed Oct 26 12:01:56 2022
version: '7.92'
xmloutputversion: '1.05'
scaninfo:
type: connect
protocol: tcp
numservices: '1'
services: '443'
verbose:
level: '0'
debugging:
level: '0'
hosthint:
status:
state: up
reason: unknown-response
reason_ttl: '0'
address:
addr: 172.67.68.251
addrtype: ipv4
hostnames:
hostname:
name: galaxy.ansible.com
type: user
host:
starttime: '1666782116'
endtime: '1666782116'
status:
state: up
reason: syn-ack
reason_ttl: '0'
address:
addr: 172.67.68.251
addrtype: ipv4
hostnames:
hostname:
- name: galaxy.ansible.com
type: user
- name: galaxy.ansible.com
type: PTR
ports:
port:
protocol: tcp
portid: '443'
state:
state: open
reason: syn-ack
reason_ttl: '0'
service:
name: https
method: table
conf: '3'
times:
srtt: '10773'
rttvar: '8291'
to: '100000'
runstats:
finished:
time: '1666782116'
timestr: Wed Oct 26 12:01:56 2022
summary: Nmap done at Wed Oct 26 12:01:56 2022; 1 IP address (1 host up) scanned
in 0.10 seconds
elapsed: '0.10'
exit: success
hosts:
up: '1'
down: '0'
total: '1'
So I was wondering if a ---xml-flatten
parser that first used XSLT to flatten XML might be something that could be considered?
Issue Analytics
- State:
- Created a year ago
- Comments:13 (13 by maintainers)
Top Results From Across the Web
How can we flatten XML parsed with fast-xml-parser
We are parsing XML using fast-xml-parser which works really well with most of our inputs. However we have some inputs that are an...
Read more >How to properly parse load , flatten XML to snowflake
Hello,. can someone help to parse the following XML to Snowflake, I dont get the documentation at all for this is not working...
Read more >Azure Data Factory (ADF) - Parse/Flatten XML file
Azure Data Factory (ADF) - Parse/Flatten XML file - Get content of all elements match wildcard criteria and sitting in different segments in ......
Read more >XML Flattener - StreamSets Documentation
The XML Flattener processor flattens a well-formed XML document embedded in a string field and adds the flattened data to the record as...
Read more >Solved: XML flattening like JSON Parse - Alteryx Community
Solved: Hey all, Given that the JSON parse is so good at flattening out data to make it easier to transpose - we're...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Released in
jc
v1.22.2.I have updated the
xml
parser in thedev
branch with the-r
behavior.https://github.com/kellyjonbrazil/jc/blob/dev/jc/parsers/xml.py