question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error parsing of tesseract output is brittle: a bytes-like object is required, not 'str'

See original GitHub issue

When using python 3.5 and pillow (the original PIL library is quite old now), I receive an error on this very simple example:

import pytesseract

try:
    import Image
except ImportError:
    from PIL import Image

pytesseract.image_to_string(Image.open('test_image.png'))

The error is:

Traceback (most recent call last):
  File "tesseract_test.py", line 8, in <module>
    pytesseract.image_to_string(Image.open('test_image.png'))
  File "C:\Users\tbarik\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 163, in image_to_string
    errors = get_errors(error_string)
  File "C:\Users\tbarik\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 111, in get_errors
    error_lines = tuple(line for line in lines if line.find('Error') >= 0)
  File "C:\Users\tbarik\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 111, in <genexpr>
    error_lines = tuple(line for line in lines if line.find('Error') >= 0)
TypeError: a bytes-like object is required, not 'str'

I’m using Windows 10, 64-bit, with python x64.

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Reactions:9
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

4reactions
400ykcommented, Jan 9, 2017

My system is OSX El Captan and I’m running Python 3.6. I had the same problem but now with a little fix of the pytesseract.py, the problem is resolved. The error occurs because we didn’t set the environment variable TESSDATA_PREFIX in the subprocess. Therefore, in the pytesseract.py file, under the function “run_tesseract”, add

my_env = {“TESSDATA_PREFIX”:“/opt/local/share”}

(for me the /opt/local/share is the parent folder that contains tessdata, plz change accordingly)

Then in the same function change the “proc=…” to

proc = subprocess.Popen(command, env=my_env,
        stderr=subprocess.PIPE)

Lastly, at the beginning of the file, change the definition of tesseract_cmd to:

tesseract_cmd = ‘/opt/local/bin/tesseract’

(namely to specify the absolute path, if you aren’t sure, can go to shell and enter “which tesseract” to find out).

After making the above changes, you can run “python3 setup.py install” to install the pytesseract package.

Good luck!

4reactions
z3ntucommented, Jul 26, 2016

In Arch Linux you have to install the package tesseract-data-<lang> eg tesseract-data-eng for english.

Read more comments on GitHub >

github_iconTop Results From Across the Web

TypeError: a bytes-like object is required, not 'str' in python 3.5 ...
This is a known bug in pytesseract, see issue #32: Error parsing of tesseract output is brittle: a bytes-like object is required, not...
Read more >
Typeerror a bytes like object is required not str : How to Fix?
Here We have encoded the string a and b is not encoded. Now when we use the “in” operator a is a byte...
Read more >
How to Fix Typeerror a bytes-like object is required not 'str'
Solution #1: Convert to a bytes object. To fix the error, the types used by the split() operation should match. The simplest solution...
Read more >
Python typeerror: a bytes-like object is required, not 'str' Solution
The Python typeerror: a bytes-like object is required, not 'str' error is raised when you perform a string operation on a bytes object....
Read more >
Text Recognition Tool — Layout Parser 0.3.2 documentation
Google Cloud Vision API returns the output text in two types: text_annotations : In this format, GCV automatically find the best aggregation level...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found