Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow access to Word field codes via instrText

See original GitHub issue

I’d like to use mammoth.js to extract bibliographic metadata from references put in with the Zotero reference manager. These references are usually encoded as fields, and their field code looks like:

<w:r>
    <w:instrText xml:space="preserve">
        ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"7kotdh7ut","properties":{"formattedCitation":"(Beese, Negishi, &amp; Levin, 2009)","plainCitation":"(Beese, Negishi, &amp; Levin,
        2009)"},"citationItems":[{"id":78,"uris":["http://zotero.org/users/1031436/items/4WATGD54"],"uri":["http://zotero.org/users/1031436/items/4WATGD54"],"itemData":{"id":78,"type":"article-journal","title":"Identification of Positive Regulators of the
        Yeast Fps1 Glycerol Channel","container-title":"PLoS Genet","page":"e1000738","volume":"5","issue":"11","source":"PLoS Genet","abstract":"Author Summary\nWhen challenged by changes in extracellular osmolarity, many fungal species regulate their
        intracellular glycerol concentration to modulate their internal osmotic pressure. Maintenance of osmotic homeostasis prevents either cellular collapse under hyper-osmotic stress or cell rupture under hypo-osmotic stress. In baker's yeast, the Fps1
        glycerol channel functions as the main vent for glycerol. Proper regulation of Fps1 is critical to the maintenance of osmotic homeostasis. In this study, we identify a pair of proteins (Rgc1 and Rgc2) that function as positive regulators of Fps1
        activity. Their absence results in hyper-accumulation of glycerol and consequent cell lysis due to impaired Fps1 channel activity. Additionally, we found that these glycerol channel regulators function between the Hog1 (High Osmolarity Glycerol
        response) signaling kinase and Fps1, defining a signaling pathway for control of glycerol efflux. Because members of the Rgc1/2 family are found among pathogenic fungal species, but not in humans, they represent potentially attractive targets for
        antifungal drug development.","URL":"http://dx.doi.org/10.1371/journal.pgen.1000738","DOI":"10.1371/journal.pgen.1000738","journalAbbreviation":"PLoS Genet","author":[{"family":"Beese","given":"Sara
        E."},{"family":"Negishi","given":"Takahiro"},{"family":"Levin","given":"David
        E."}],"issued":{"date-parts":[["2009",11,26]]},"accessed":{"date-parts":[["2011",10,21]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"}
    </w:instrText>
</w:r>

Example file: zotero-cit.docx

Is there any way to target instrText? When I try creating a style map for it, I get the following message:

Did not understand this style mapping, so ignored it: instrText => div.csl Error was at character number 1: Expected element type but got identifier “instrText”

If I could monkey patch this in a personal copy that would be fine too, but I couldn’t find any places in the code where instrText is explicitly ignored, and I couldn’t figure out what to change.

(followup of https://github.com/mwilliamson/mammoth.js/issues/8#issuecomment-250647081)

Issue Analytics

State:
Created 7 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

rmzellecommented, Oct 14, 2016

I know I’m not really using mammoth.js for its intended purpose, but for the curious:

I solved this for myself (https://github.com/rmzelle/ref-extractor/) by extending the variable “xmlElementReaders” to recognize instrText elements, and exporting the contents of each field to a global array variable: https://github.com/rmzelle/mammoth.js/commit/77bcac57a2f4f5095f7e8ae71419863dbdb3bc26#diff-68b60b2443cf3be90e7e7223aaf3d383

It requires me to use a customized version of mammoth.js, but it seemed the easiest way to allow me to:

a) access the content of instrText elements b) post-process the content of each instrText element c) ignore any other content in the Word document

0reactions

mwilliamsoncommented, Nov 9, 2016

Okey doke, closing the issue.