question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Flatten XML Parser?

See original GitHub issue

XML that makes use of a lot of attributes, rather than elements, results in JSON that is hard to work with using Ansible / JMESPath, for example nmap has XML output (but not JSON):

nmap -oX - -p 443 galaxy.ansible.com | xmllint --pretty 1 -
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE nmaprun>
<?xml-stylesheet href="file:///usr/bin/../share/nmap/nmap.xsl" type="text/xsl"?>
<!-- Nmap 7.92 scan initiated Wed Oct 26 11:51:38 2022 as: nmap -oX - -p 443 galaxy.ansible.com -->
<nmaprun scanner="nmap" args="nmap -oX - -p 443 galaxy.ansible.com" start="1666781498" startstr="Wed Oct 26 11:51:38 2022" version="7.92" xmloutputversion="1.05">
  <scaninfo type="connect" protocol="tcp" numservices="1" services="443"/>
  <verbose level="0"/>
  <debugging level="0"/>
  <hosthint>
    <status state="up" reason="unknown-response" reason_ttl="0"/>
    <address addr="172.67.68.251" addrtype="ipv4"/>
    <hostnames>
      <hostname name="galaxy.ansible.com" type="user"/>
    </hostnames>
  </hosthint>
  <host starttime="1666781498" endtime="1666781498">
    <status state="up" reason="syn-ack" reason_ttl="0"/>
    <address addr="172.67.68.251" addrtype="ipv4"/>
    <hostnames>
      <hostname name="galaxy.ansible.com" type="user"/>
      <hostname name="galaxy.ansible.com" type="PTR"/>
    </hostnames>
    <ports>
      <port protocol="tcp" portid="443">
        <state state="open" reason="syn-ack" reason_ttl="0"/>
        <service name="https" method="table" conf="3"/>
      </port>
    </ports>
    <times srtt="12260" rttvar="9678" to="100000"/>
  </host>
  <runstats>
    <finished time="1666781498" timestr="Wed Oct 26 11:51:38 2022" summary="Nmap done at Wed Oct 26 11:51:38 2022; 1 IP address (1 host up) scanned in 0.10 seconds" elapsed="0.10" exit="success"/>
    <hosts up="1" down="0" total="1"/>
  </runstats>
</nmaprun>

Convert this into JSON / YAML and the results are not great…

nmap -oX - -p 443 galaxy.ansible.com | xmllint --pretty 1 - | jc --xml -py
---
nmaprun:
  '@scanner': nmap
  '@args': nmap -oX - -p 443 galaxy.ansible.com
  '@start': '1666781628'
  '@startstr': Wed Oct 26 11:53:48 2022
  '@version': '7.92'
  '@xmloutputversion': '1.05'
  scaninfo:
    '@type': connect
    '@protocol': tcp
    '@numservices': '1'
    '@services': '443'
  verbose:
    '@level': '0'
  debugging:
    '@level': '0'
  hosthint:
    status:
      '@state': up
      '@reason': unknown-response
      '@reason_ttl': '0'
    address:
      '@addr': 172.67.68.251
      '@addrtype': ipv4
    hostnames:
      hostname:
        '@name': galaxy.ansible.com
        '@type': user
  host:
    '@starttime': '1666781628'
    '@endtime': '1666781628'
    status:
      '@state': up
      '@reason': syn-ack
      '@reason_ttl': '0'
    address:
      '@addr': 172.67.68.251
      '@addrtype': ipv4
    hostnames:
      hostname:
      - '@name': galaxy.ansible.com
        '@type': user
      - '@name': galaxy.ansible.com
        '@type': PTR
    ports:
      port:
        '@protocol': tcp
        '@portid': '443'
        state:
          '@state': open
          '@reason': syn-ack
          '@reason_ttl': '0'
        service:
          '@name': https
          '@method': table
          '@conf': '3'
    times:
      '@srtt': '13479'
      '@rttvar': '11398'
      '@to': '100000'
  runstats:
    finished:
      '@time': '1666781628'
      '@timestr': Wed Oct 26 11:53:48 2022
      '@summary': Nmap done at Wed Oct 26 11:53:48 2022; 1 IP address (1 host up)
        scanned in 0.10 seconds
      '@elapsed': '0.10'
      '@exit': success
    hosts:
      '@up': '1'
      '@down': '0'
      '@total': '1'

However if the XML is flattened using XSLT first:

nmap -oX - -p 443 galaxy.ansible.com | xmllint --pretty 1 - > galaxy.ansible.com.xml
xsltproc attributes2elements.xslt galaxy.ansible.com.xml 
<?xml version="1.0"?>
<?xml-stylesheet href="file:///usr/bin/../share/nmap/nmap.xsl" type="text/xsl"?><!-- Nmap 7.92 scan initiated Wed Oct 26 12:01:56 2022 as: nmap -oX - -p 443 galaxy.ansible.com -->
<nmaprun><scanner>nmap</scanner><args>nmap -oX - -p 443 galaxy.ansible.com</args><start>1666782116</start><startstr>Wed Oct 26 12:01:56 2022</startstr><version>7.92</version><xmloutputversion>1.05</xmloutputversion>
  <scaninfo><type>connect</type><protocol>tcp</protocol><numservices>1</numservices><services>443</services></scaninfo>
  <verbose><level>0</level></verbose>
  <debugging><level>0</level></debugging>
  <hosthint>
    <status><state>up</state><reason>unknown-response</reason><reason_ttl>0</reason_ttl></status>
    <address><addr>172.67.68.251</addr><addrtype>ipv4</addrtype></address>
    <hostnames>
      <hostname><name>galaxy.ansible.com</name><type>user</type></hostname>
    </hostnames>
  </hosthint>
  <host><starttime>1666782116</starttime><endtime>1666782116</endtime>
    <status><state>up</state><reason>syn-ack</reason><reason_ttl>0</reason_ttl></status>
    <address><addr>172.67.68.251</addr><addrtype>ipv4</addrtype></address>
    <hostnames>
      <hostname><name>galaxy.ansible.com</name><type>user</type></hostname>
      <hostname><name>galaxy.ansible.com</name><type>PTR</type></hostname>
    </hostnames>
    <ports>
      <port><protocol>tcp</protocol><portid>443</portid>
        <state><state>open</state><reason>syn-ack</reason><reason_ttl>0</reason_ttl></state>
        <service><name>https</name><method>table</method><conf>3</conf></service>
      </port>
    </ports>
    <times><srtt>10773</srtt><rttvar>8291</rttvar><to>100000</to></times>
  </host>
  <runstats>
    <finished><time>1666782116</time><timestr>Wed Oct 26 12:01:56 2022</timestr><summary>Nmap done at Wed Oct 26 12:01:56 2022; 1 IP address (1 host up) scanned in 0.10 seconds</summary><elapsed>0.10</elapsed><exit>success</exit></finished>
    <hosts><up>1</up><down>0</down><total>1</total></hosts>
  </runstats>
</nmaprun>

You then have something that is nicer to work with:

xsltproc attributes2elements.xslt galaxy.ansible.com.xml | jc --xml -py
---
nmaprun:
  scanner: nmap
  args: nmap -oX - -p 443 galaxy.ansible.com
  start: '1666782116'
  startstr: Wed Oct 26 12:01:56 2022
  version: '7.92'
  xmloutputversion: '1.05'
  scaninfo:
    type: connect
    protocol: tcp
    numservices: '1'
    services: '443'
  verbose:
    level: '0'
  debugging:
    level: '0'
  hosthint:
    status:
      state: up
      reason: unknown-response
      reason_ttl: '0'
    address:
      addr: 172.67.68.251
      addrtype: ipv4
    hostnames:
      hostname:
        name: galaxy.ansible.com
        type: user
  host:
    starttime: '1666782116'
    endtime: '1666782116'
    status:
      state: up
      reason: syn-ack
      reason_ttl: '0'
    address:
      addr: 172.67.68.251
      addrtype: ipv4
    hostnames:
      hostname:
      - name: galaxy.ansible.com
        type: user
      - name: galaxy.ansible.com
        type: PTR
    ports:
      port:
        protocol: tcp
        portid: '443'
        state:
          state: open
          reason: syn-ack
          reason_ttl: '0'
        service:
          name: https
          method: table
          conf: '3'
    times:
      srtt: '10773'
      rttvar: '8291'
      to: '100000'
  runstats:
    finished:
      time: '1666782116'
      timestr: Wed Oct 26 12:01:56 2022
      summary: Nmap done at Wed Oct 26 12:01:56 2022; 1 IP address (1 host up) scanned
        in 0.10 seconds
      elapsed: '0.10'
      exit: success
    hosts:
      up: '1'
      down: '0'
      total: '1'

So I was wondering if a ---xml-flatten parser that first used XSLT to flatten XML might be something that could be considered?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
kellyjonbrazilcommented, Nov 8, 2022

Released in jc v1.22.2.

1reaction
kellyjonbrazilcommented, Nov 2, 2022

I have updated the xml parser in the dev branch with the -r behavior.

https://github.com/kellyjonbrazil/jc/blob/dev/jc/parsers/xml.py

Read more comments on GitHub >

github_iconTop Results From Across the Web

How can we flatten XML parsed with fast-xml-parser
We are parsing XML using fast-xml-parser which works really well with most of our inputs. However we have some inputs that are an...
Read more >
How to properly parse load , flatten XML to snowflake
Hello,. can someone help to parse the following XML to Snowflake, I dont get the documentation at all for this is not working...
Read more >
Azure Data Factory (ADF) - Parse/Flatten XML file
Azure Data Factory (ADF) - Parse/Flatten XML file - Get content of all elements match wildcard criteria and sitting in different segments in ......
Read more >
XML Flattener - StreamSets Documentation
The XML Flattener processor flattens a well-formed XML document embedded in a string field and adds the flattened data to the record as...
Read more >
Solved: XML flattening like JSON Parse - Alteryx Community
Solved: Hey all, Given that the JSON parse is so good at flattening out data to make it easier to transpose - we're...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found