UnicodeEncodeError when using Stream flavor
See original GitHub issuePython 3.7 on Windows
Using this pdf: http://tsbde.texas.gov/78i8ljhbj/Fiscal-Year-2014-Disciplinary-Actions.pdf
I am running it through Camelot to convert to html using Stream flavor and I get the following error at execution of the export
line, once it reaches page 4 of 8:
“UnicodeEncodeError -‘charmap’ codec can’t encode character ‘\u2010’ in position y: character maps to undefined.”
Pages 1 through 3 get converted nicely - it crashes somewhere between page 4 and 5. In debug with the breakpoint after the tables.export
line, it also brings me to line 19 of cp1252.py, if that’s helpful.
I am on Windows, and this seems not to be an issue on Mac. But Windows is our environment so I have to figure this out. I have done a ton of research on this error and everything for this in Python world points to either adding encoding="utf-8"
or errors="ignore"
, but those both relate to the file.read
method and can’t be used in Camelot’s export
method.
Any thoughts on what I could add to the script to get around this error? We can’t avoid using Windows, and this seems to be the final blocker for us for being able to really make great use of this tool for our PDF’s.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
https://github.com/camelot-dev/camelot/pull/188
It is my first PR. If it is uncorrect, please provide some help.
I found this solution (it is a monkey patch): https://stackoverflow.com/questions/63403629/python-camelot-pdf-unicodeencodeerror-when-using-stream-flavor-on-windows/