question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow namespace prefixes other than 'None' in PageXML

See original GitHub issue

Eynollah, e.g., produces PageXML files that use an explicit prefix (xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15").

calamari_ocr/ocr/dataset/datareader/pagexml/reader.py, however, expects the prefix to be ‘None’ and throws an error when processing an eynollah pagexml.

When I change line 120 of reader.py from

ns = {"ns": root.nsmap[None]} 

to

ns = {'ns' : 'http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15'}

it works. I’m not sure if you can generalize the namespace dictionary to cover both output styles. Maybe xpath’s local-name function (instead of lxml find or findall) is an alternative.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:17 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
ChWickcommented, Jun 12, 2021

Okay, just wanted to “clarify” the most obvious mistake. Luckily, I receive the same error using your files. I will examine this and hopefully find the reason!

0reactions
ChWickcommented, Jun 20, 2021

Thanks for testing @alexander-winkler . I will close this and merge #259

Read more comments on GitHub >

github_iconTop Results From Across the Web

xml - PHP - Include DOMDocumentFragment with undefined ...
1 Answer 1 ... If you don't define the namespaces inside the fragment string then it will not be valid XML. Namespace prefixes...
Read more >
Namespaces in XML - W3C
URIs can contain characters not allowed in names, and so cannot be used directly as namespace prefixes. Therefore, the namespace prefix ...
Read more >
XML Namespaces - CDuce
In XML, the bindings from prefixes to namespace URIs are introduction through special xmlns:prefix attributes. In CDuce, instead, there are explicit ...
Read more >
The name "xml" is not legal for JDOM/XML Namespace prefixs
All of a sudden, we're getting this error on all of our recently ... is not legal for JDOM/XML Namespace prefixs: Namespace prefixes...
Read more >
man page XML::Parser::Expat section 3
Other encodings may be used if they have encoding maps in one of the ... from element and attributes names where those prefixes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found