question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error parsing document with bullet list

See original GitHub issue

I am trying to use this library to gather indention level from a word document. I am getting the following error when the word document includes a bullet point list…

Traceback (most recent call last):
  File "/home/src/main.py", line 9, in <module>
    text_groups = convert_document_to_text_groups(DOCUMENT_NAME, PATH_TO_DOCUMENT)
  File "/home/src/modules/document_converter.py", line 7, in convert_document_to_text_groups
    my_doc_as_json = simplify(document)
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/__init__.py", line 33, in simplify
    out = document(doc.element).to_json(doc, _options)
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/base.py", line 106, in to_json
    "VALUE": [ elt.to_json(doc, options) for elt in self],
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/base.py", line 106, in <listcomp>
    "VALUE": [ elt.to_json(doc, options) for elt in self],
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/body.py", line 25, in to_json
    JSON = elt.to_json(doc, options, iter_me)
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/paragraph.py", line 167, in to_json
    _indent = get_paragraph_ind(self.fragment, doc)
  File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/utils/paragrapy_style.py", line 56, in get_paragraph_ind
    num_style.pPr is not None and \
AttributeError: 'lxml.etree._Element' object has no attribute 'pPr'

I have nothing in my word document except these two lines.

image

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ghostcommented, Jun 10, 2020

Hello!

I had the same problem and I solved it by installing the version of python-docx that is specified in the README.md

This project relies on the python-docx package which can be installed via pip install python-docx. However, as of this writing, if you wish to scrape documents which contain (A) form fields such as drop down lists, checkboxes and text inputs or (B) nested documents (subdocs, altChunks, etc.), you’ll need to clone this fork of the python-docx package.

0reactions
jdthorpecommented, Mar 8, 2022

Closing this issue as it appears to be identical to issue #12.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Parse bulleted list in Markua - Perl Maven
We are going to implement the parser for the bulleted list first. After reading the spec the first thing is to create a...
Read more >
NewtonSoft Json parser error: bullet characters in JSON string
When I try to parse it with Newtonsoft Json, it throws error that the parsing is failing with an error "After parsing a...
Read more >
Parsing or creating bullet points/numbered lists with POI 3.8 ...
This is not a question, but I felt the need to type a few lines for other people who might want to work...
Read more >
A bulleted list or a numbered list becomes misaligned when ...
Method 1: Modify the layout options · Open the document that contains the misaligned list. · Click the Microsoft Office Button, click Word...
Read more >
Don't Make These ATS Formatting Mistakes - Jobscan
However, non-traditional bullet points like stars, diamonds, and checkboxes do not always get parsed correctly. The quick fix? Use the good ' ol ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found