question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for Semi-Structured XML

See original GitHub issue

Semi-structured element should not be parsed and the content should be kept as is. For example:

<level1>
    <level2>
        <level3>First level3.</level3>
        text outside 1st level3 at the end
    </level2>
    <level2>
        text outside 2nd level3 at the beginning
        <level3>Second level3.</level3>
        text outside 2nd level3 at the end
    </level2>
    <level2>
        text outside 3rd level3 at the beginning
        <level3>Third leve3.</level3>
    </level2>
</level1>

will produce (at the corresponding levels):

level2: [
{ #text: "text outside 1st level3 at the end", level3: "First level3." },
{ #text: "text outside 2nd level3 at the beginningtext outside 2st level3 at the end", level3: "Second level3." },
{ #text: "text outside 3rd level3 at the end", level3: "Third level3." },
]

which is not only irreversible (not keeping order) but for 2nd level3 also meaningless (joining the texts). According to “spec” you claim you are adopting, at least the case of 2nd level3 should not be parsed.

It would be also great if a tag name(s) could be specified (as a parameter to parse function) whose content wouldn’t be parsed at all. It could also solve the described issue sometimes (as the user would specify that tag level2 shouldn’t be parsed and its content should be kept in #text property).

Thank you in advance for your comments.

Issue Analytics

  • State:open
  • Created 9 years ago
  • Reactions:7
  • Comments:12 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
margrucommented, Apr 25, 2014

AFAIK, there are two options:

xml.sax.saxutils.unescape

which unescape &amp;, &lt;, and &gt; in a string of data.

Or

in Python 2

HTMLParser.HTMLParser.unescape

in Python 3

html.parser.HTMLParser().unescape

which is not documented.

Both are a standard part of Python. The (current) resulting string could be passed into one of these functions to get rid of the entities. But I’m not familiar with the internal processes of your module so I don’t know if it can be really used as I suppose.

0reactions
javadevcommented, Dec 3, 2018

I agree. In case of external independent json parser we get different elements order.

Read more comments on GitHub >

github_iconTop Results From Across the Web

XML <and Semi-Structured Data> - ACM Queue
How does XML help solve the semi-structured data problem? XML provides a tool for representing and grappling with the data and recognizing the ......
Read more >
What is Semi-structured Data? - Snowflake
HTML, XML, and other markup languages are all considered semi-structured. Their schemas may be descriptive, partial, or evolving. Semi-structured web data often ...
Read more >
Document semi-structured (JSON, XML) data in relational ...
Document semi-structured (JSON, XML) data in relational databases · The hidden data complexity · Document JSON · Linking documents and columns · End ......
Read more >
Semi-structured data - Wikipedia
Semi-structured data is a form of structured data that does not obey the tabular structure of data models associated with relational databases or...
Read more >
What is Semi-structured data? - GeeksforGeeks
XML is widely used to store and exchange semi-structured data. It allows its user to define tags and attributes to store the data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found