question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add validator (Nonconformance to Office Open XML schema)

See original GitHub issue

EDIT: The actionable thing to do here is add a javascript validator against one of the wml.xsd schemas.

==========

Hey there! Great library, I’ve been using it a while and trying to help out a little where I can.

An issue I’ve come across is that it’s really easy to generate a corrupted document, and tricky to pinpoint exactly where and why this happens. It’s not the fault of this library, and to be honest there aren’t any good options for validating these XML documents in javascript (I’m working on this – soon I hope to have a validator with specific error messages in pure js!).

I’ve set up a hacky tool to validate these documents locally on linux with libxml/xmllint – I’ll share that setup once I write a little wrapper around it – and I’ve noticed that it spits out a ton of errors. One of the most common errors is that, in several places in the spec, there’s a specific sequence of nodes expected in order to conform. Nodes being out of order mostly works, but I suspect it’s caused some bugs!

See for example, the schema for the base abstract type used under w:pPr - notice the <xsd:sequence> - this means they must be in this specific order to conform.

  <xsd:complexType name="CT_PPrBase">
    <xsd:sequence>
      <xsd:element name="pStyle" type="CT_String" minOccurs="0"/>
      <xsd:element name="keepNext" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="keepLines" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="pageBreakBefore" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="framePr" type="CT_FramePr" minOccurs="0"/>
      <xsd:element name="widowControl" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="numPr" type="CT_NumPr" minOccurs="0"/>
      <xsd:element name="suppressLineNumbers" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="pBdr" type="CT_PBdr" minOccurs="0"/>
      <xsd:element name="shd" type="CT_Shd" minOccurs="0"/>
      <xsd:element name="tabs" type="CT_Tabs" minOccurs="0"/>
      <xsd:element name="suppressAutoHyphens" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="kinsoku" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="wordWrap" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="overflowPunct" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="topLinePunct" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="autoSpaceDE" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="autoSpaceDN" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="bidi" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="adjustRightInd" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="snapToGrid" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="spacing" type="CT_Spacing" minOccurs="0"/>
      <xsd:element name="ind" type="CT_Ind" minOccurs="0"/>
      <xsd:element name="contextualSpacing" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="mirrorIndents" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="suppressOverlap" type="CT_OnOff" minOccurs="0"/>
      <xsd:element name="jc" type="CT_Jc" minOccurs="0"/>
      <xsd:element name="textDirection" type="CT_TextDirection" minOccurs="0"/>
      <xsd:element name="textAlignment" type="CT_TextAlignment" minOccurs="0"/>
      <xsd:element name="textboxTightWrap" type="CT_TextboxTightWrap" minOccurs="0"/>
      <xsd:element name="outlineLvl" type="CT_DecimalNumber" minOccurs="0"/>
      <xsd:element name="divId" type="CT_DecimalNumber" minOccurs="0"/>
      <xsd:element name="cnfStyle" type="CT_Cnf" minOccurs="0" maxOccurs="1"/>
    </xsd:sequence>
  </xsd:complexType>

Sadly, this is not documented anywhere in the officeopenxml.com site, and is only found in the ECMA-376 reference schemas (see for example, ECMA-376 fifth edition, part one, page 3839, containing a version of the above element type).

https://www.ecma-international.org/publications-and-standards/standards/ecma-376/

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
devoidfurycommented, May 20, 2021

I made quite a bit of progress on this, here: https://github.com/dolanmiu/docx/compare/master...devoidfury:bug/ooxml-conformance-fixes

The main errors I’m getting that I don’t know how to handle:

Invalid mirrorMargins attribute on w:pgMar EDIT: this has been removed on my branch.

Invalid element w:shdCs (couldn’t find a reference for these anywhere – should it just be deleted? Looks like w:shd does everything here) EDIT: this has been removed in my branch.

w:document has an invalid attribute mc:Ignorable="w14 w15 wp14", couldn’t find a reference or documentation for this property anywhere. This is written about here, and it’s a commonly used attribute among various XML document types: http://www.wordarticles.com/Articles/Formats/OOXML/OOXML.php

0reactions
dolanmiucommented, Sep 30, 2021

@devoidfury I am adding it into GitHub Actions

Thank you for your research into this area

The checks are based on the same OOXML schemas on your docx-validator project:

https://github.com/dolanmiu/docx/pull/1202

Read more comments on GitHub >

github_iconTop Results From Across the Web

Validate a word processing document (Open XML SDK)
This topic shows how to use the classes in the Open XML SDK 2.5 for Office to programmatically validate a word processing document....
Read more >
Validation Against the Combination of Office Open XML and ...
The first extension is accompanied with a parent <AlternateContent> element and a sibling <Fallback> element, while the second one may appear anywhere in...
Read more >
Lists numbers not rendered in online word · Issue #876 - GitHub
I have simple doc, that looks find in word, but misses the list numbers in online word. word: online word: note that the...
Read more >
XML Schemas - Cover Pages
Attributes from other namespaces may be added to an element, whether or not there are validity constraints for those attributes. It is not ......
Read more >
3. Add simple rules to an XSD, validate XML file - YouTube
Add validation rules to an XSD file. Then, create an XML file that is validated by this XSD. See scenarios where the XML...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found