Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error parsing document with bullet list

See original GitHub issue

I am trying to use this library to gather indention level from a word document. I am getting the following error when the word document includes a bullet point list…

Traceback (most recent call last):
  File "/home/src/main.py", line 9, in <module>
    text_groups = convert_document_to_text_groups(DOCUMENT_NAME, PATH_TO_DOCUMENT)
  File "/home/src/modules/document_converter.py", line 7, in convert_document_to_text_groups
    my_doc_as_json = simplify(document)
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/__init__.py", line 33, in simplify
    out = document(doc.element).to_json(doc, _options)
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/base.py", line 106, in to_json
    "VALUE": [ elt.to_json(doc, options) for elt in self],
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/base.py", line 106, in <listcomp>
    "VALUE": [ elt.to_json(doc, options) for elt in self],
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/body.py", line 25, in to_json
    JSON = elt.to_json(doc, options, iter_me)
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/paragraph.py", line 167, in to_json
    _indent = get_paragraph_ind(self.fragment, doc)
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/utils/paragrapy_style.py", line 56, in get_paragraph_ind
    num_style.pPr is not None and \
AttributeError: 'lxml.etree._Element' object has no attribute 'pPr'

I have nothing in my word document except these two lines.

Issue Analytics

State:
Created 3 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

ghostcommented, Jun 10, 2020

Hello!

I had the same problem and I solved it by installing the version of python-docx that is specified in the README.md

This project relies on the python-docx package which can be installed via pip install python-docx. However, as of this writing, if you wish to scrape documents which contain (A) form fields such as drop down lists, checkboxes and text inputs or (B) nested documents (subdocs, altChunks, etc.), you’ll need to clone this fork of the python-docx package.

0reactions

jdthorpecommented, Mar 8, 2022

Closing this issue as it appears to be identical to issue #12.

Top Results From Across the Web

Parse bulleted list in Markua - Perl Maven

We are going to implement the parser for the bulleted list first. After reading the spec the first thing is to create a...

NewtonSoft Json parser error: bullet characters in JSON string

When I try to parse it with Newtonsoft Json, it throws error that the parsing is failing with an error "After parsing a...

Parsing or creating bullet points/numbered lists with POI 3.8 ...

This is not a question, but I felt the need to type a few lines for other people who might want to work...

A bulleted list or a numbered list becomes misaligned when ...

Method 1: Modify the layout options · Open the document that contains the misaligned list. · Click the Microsoft Office Button, click Word...

Don't Make These ATS Formatting Mistakes - Jobscan

However, non-traditional bullet points like stars, diamonds, and checkboxes do not always get parsed correctly. The quick fix? Use the good ' ol ......