Error parsing document with bullet list
See original GitHub issueI am trying to use this library to gather indention level from a word document. I am getting the following error when the word document includes a bullet point list…
Traceback (most recent call last):
File "/home/src/main.py", line 9, in <module>
text_groups = convert_document_to_text_groups(DOCUMENT_NAME, PATH_TO_DOCUMENT)
File "/home/src/modules/document_converter.py", line 7, in convert_document_to_text_groups
my_doc_as_json = simplify(document)
File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/__init__.py", line 33, in simplify
out = document(doc.element).to_json(doc, _options)
File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/base.py", line 106, in to_json
"VALUE": [ elt.to_json(doc, options) for elt in self],
File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/base.py", line 106, in <listcomp>
"VALUE": [ elt.to_json(doc, options) for elt in self],
File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/body.py", line 25, in to_json
JSON = elt.to_json(doc, options, iter_me)
File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/elements/paragraph.py", line 167, in to_json
_indent = get_paragraph_ind(self.fragment, doc)
File "/usr/local/lib/python3.8/site-packages/simplify_docx-0.1.0-py3.8.egg/simplify_docx/utils/paragrapy_style.py", line 56, in get_paragraph_ind
num_style.pPr is not None and \
AttributeError: 'lxml.etree._Element' object has no attribute 'pPr'
I have nothing in my word document except these two lines.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Parse bulleted list in Markua - Perl Maven
We are going to implement the parser for the bulleted list first. After reading the spec the first thing is to create a...
Read more >NewtonSoft Json parser error: bullet characters in JSON string
When I try to parse it with Newtonsoft Json, it throws error that the parsing is failing with an error "After parsing a...
Read more >Parsing or creating bullet points/numbered lists with POI 3.8 ...
This is not a question, but I felt the need to type a few lines for other people who might want to work...
Read more >A bulleted list or a numbered list becomes misaligned when ...
Method 1: Modify the layout options · Open the document that contains the misaligned list. · Click the Microsoft Office Button, click Word...
Read more >Don't Make These ATS Formatting Mistakes - Jobscan
However, non-traditional bullet points like stars, diamonds, and checkboxes do not always get parsed correctly. The quick fix? Use the good ' ol ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello!
I had the same problem and I solved it by installing the version of python-docx that is specified in the README.md
This project relies on the python-docx package which can be installed via pip install python-docx. However, as of this writing, if you wish to scrape documents which contain (A) form fields such as drop down lists, checkboxes and text inputs or (B) nested documents (subdocs, altChunks, etc.), you’ll need to clone this fork of the python-docx package.
Closing this issue as it appears to be identical to issue #12.