Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't get the content in revision mode in table of docx file

See original GitHub issue

As shown in above, I want to get the last non-empty row of data in the table.However, because the last line is in the revision state, the data cannot be obtained normally using the tool.

the python code is as below:

#coding:utf-8
import os
from docx import Document
  
def parse_docx(f):
  i = 0
  doc = Document(f)
  tables = doc.tables
  t = tables[i]
  for j in range(len(t.rows)):
      index = len(t.rows) - j - 1         
      version = t.cell(index,0).text
      datetime = t.cell(index,1).text
      modifyContent = t.cell(index,2).text
      author = t.cell(index,3).text
      if (len(version) != 0) and (len(datetime) != 0) and (len(modifyContent) != 0) and (len(author) != 0):            
          return f.decode(encoding='gbk') + '\n' + version + '   ' + datetime + '   ' + modifyContent + '   ' + author  + '\n'
          break
  
if __name__ == "__main__":
  PATH = os.path.dirname(os.path.abspath(__file__)) 
  doc_files = os.listdir(PATH)
  for doc in doc_files:
    if os.path.splitext(doc)[1] == '.docx':
      try:
        retstr = parse_docx(PATH+'\\'+doc)
        print retstr
      except Exception as e:
        print e

The output obtained after executing the script is “V0.0.3 2014-01-22 Modify 2 Test3”，that’s not the data I expected to get.

Issue Analytics

State:
Created 4 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

scannycommented, Dec 3, 2019

Something like this should work:

tbl = table._tbl
# ---move each run inside a `w:ins` element up to be a sibling of the `w:ins`---
for r in tbl.xpath("../w:ins/w:r"):
    r.getparent().addnext(r)
# ---then get rid of all the (now empty) `w:ins` elements---
for ins in tbl.xpath("../w:ins"):
    ins.getparent().remove(ins)

I expect there are more elegant ways to do this, but this should do the trick.

0reactions

gagmengcommented, Dec 10, 2019

The issus has resolved, close.