question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow access to Word field codes via instrText

See original GitHub issue

I’d like to use mammoth.js to extract bibliographic metadata from references put in with the Zotero reference manager. These references are usually encoded as fields, and their field code looks like:

<w:r>
    <w:instrText xml:space="preserve">
        ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"7kotdh7ut","properties":{"formattedCitation":"(Beese, Negishi, &amp; Levin, 2009)","plainCitation":"(Beese, Negishi, &amp; Levin,
        2009)"},"citationItems":[{"id":78,"uris":["http://zotero.org/users/1031436/items/4WATGD54"],"uri":["http://zotero.org/users/1031436/items/4WATGD54"],"itemData":{"id":78,"type":"article-journal","title":"Identification of Positive Regulators of the
        Yeast Fps1 Glycerol Channel","container-title":"PLoS Genet","page":"e1000738","volume":"5","issue":"11","source":"PLoS Genet","abstract":"Author Summary\nWhen challenged by changes in extracellular osmolarity, many fungal species regulate their
        intracellular glycerol concentration to modulate their internal osmotic pressure. Maintenance of osmotic homeostasis prevents either cellular collapse under hyper-osmotic stress or cell rupture under hypo-osmotic stress. In baker's yeast, the Fps1
        glycerol channel functions as the main vent for glycerol. Proper regulation of Fps1 is critical to the maintenance of osmotic homeostasis. In this study, we identify a pair of proteins (Rgc1 and Rgc2) that function as positive regulators of Fps1
        activity. Their absence results in hyper-accumulation of glycerol and consequent cell lysis due to impaired Fps1 channel activity. Additionally, we found that these glycerol channel regulators function between the Hog1 (High Osmolarity Glycerol
        response) signaling kinase and Fps1, defining a signaling pathway for control of glycerol efflux. Because members of the Rgc1/2 family are found among pathogenic fungal species, but not in humans, they represent potentially attractive targets for
        antifungal drug development.","URL":"http://dx.doi.org/10.1371/journal.pgen.1000738","DOI":"10.1371/journal.pgen.1000738","journalAbbreviation":"PLoS Genet","author":[{"family":"Beese","given":"Sara
        E."},{"family":"Negishi","given":"Takahiro"},{"family":"Levin","given":"David
        E."}],"issued":{"date-parts":[["2009",11,26]]},"accessed":{"date-parts":[["2011",10,21]]}}}],"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"}
    </w:instrText>
</w:r>

Example file: zotero-cit.docx

Is there any way to target instrText? When I try creating a style map for it, I get the following message:

Did not understand this style mapping, so ignored it: instrText => div.csl Error was at character number 1: Expected element type but got identifier “instrText”

If I could monkey patch this in a personal copy that would be fine too, but I couldn’t find any places in the code where instrText is explicitly ignored, and I couldn’t figure out what to change.

(followup of https://github.com/mwilliamson/mammoth.js/issues/8#issuecomment-250647081)

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
rmzellecommented, Oct 14, 2016

I know I’m not really using mammoth.js for its intended purpose, but for the curious:

I solved this for myself (https://github.com/rmzelle/ref-extractor/) by extending the variable “xmlElementReaders” to recognize instrText elements, and exporting the contents of each field to a global array variable: https://github.com/rmzelle/mammoth.js/commit/77bcac57a2f4f5095f7e8ae71419863dbdb3bc26#diff-68b60b2443cf3be90e7e7223aaf3d383

It requires me to use a customized version of mammoth.js, but it seemed the easiest way to allow me to:

a) access the content of instrText elements b) post-process the content of each instrText element c) ignore any other content in the Word document

0reactions
mwilliamsoncommented, Nov 9, 2016

Okey doke, closing the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

List of field codes in Word
An alphabetized list of field codes available for mail merge, forms, and other uses in your documents.
Read more >
3 ways to enter fields in Microsoft Word
How to insert a Word field by using Ctrl + F9 · Position the cursor where you want to insert the field. ·...
Read more >
feature: add field · Issue #31 · python-openxml/python-docx
Support for inserting/adding Field Codes in a word document. They are a handy feature for report generation type applications (originally ...
Read more >
How to update fields in MS Word with Python Docx
Include a VBA project that performs the field update in an AutoOpen macro. This, of course, means the document type must be macro-enabled...
Read more >
instrText (Field Code)
This element specifies that this run contains field codes (§ 2.16.5 ) within a complex field in the document. If this element is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found