question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UnicodeEncodeError when using Stream flavor

See original GitHub issue

Python 3.7 on Windows

Using this pdf: http://tsbde.texas.gov/78i8ljhbj/Fiscal-Year-2014-Disciplinary-Actions.pdf

I am running it through Camelot to convert to html using Stream flavor and I get the following error at execution of the export line, once it reaches page 4 of 8:

“UnicodeEncodeError -‘charmap’ codec can’t encode character ‘\u2010’ in position y: character maps to undefined.”

Pages 1 through 3 get converted nicely - it crashes somewhere between page 4 and 5. In debug with the breakpoint after the tables.export line, it also brings me to line 19 of cp1252.py, if that’s helpful.

I am on Windows, and this seems not to be an issue on Mac. But Windows is our environment so I have to figure this out. I have done a ton of research on this error and everything for this in Python world points to either adding encoding="utf-8" or errors="ignore", but those both relate to the file.read method and can’t be used in Camelot’s export method.

Any thoughts on what I could add to the script to get around this error? We can’t avoid using Windows, and this seems to be the final blocker for us for being able to really make great use of this tool for our PDF’s.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
anakin87commented, Aug 25, 2020

https://github.com/camelot-dev/camelot/pull/188

It is my first PR. If it is uncorrect, please provide some help.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python Camelot PDF - UnicodeEncodeError when using ...
You're getting UnicodeEncodeError , which in this case means that the output to be written to file contains a character than cannot be ......
Read more >
Unicode HOWTO — Python 3.11.1 documentation
The StreamRecoder class can transparently convert between encodings, taking a stream that returns data in encoding #1 and behaving like a stream returning...
Read more >
'ascii' codec can't encode character...' when using a Python ...
How do I get around the Python error "UnicodeEncodeError: 'ascii' codec can't ... box like this is called “tofu” (a little white rectangle...
Read more >
FAQ - UTF-8, UTF-16, UTF-32 & BOM - Unicode
Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream? There are several options for...
Read more >
Unicode Architecture: Not Just a Pile of Code Charts - InformIT
Flavors of Unicode. Let's take a minute to go back over the character-encoding terms from Chapter 2: An abstract character repertoire is a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found