question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

image_to_string on CentOS is returning TypeError

See original GitHub issue

So, I have this Python file which uses PyTesseract to get the text from some positions on the image as follows:

import PIL.Image as Image
import os
import pytesseract
from PIL import ImageFile
from dotenv import load_dotenv
from os.path import join, dirname
import traceback

dotenv_path = join(dirname(__file__), '.env')
load_dotenv(dotenv_path)

PYTHONIOENCODING = 'UTF-8'


def read(pic_name, ocr_data):
    read_data = {}

    ImageFile.LOAD_TRUNCATED_IMAGES = True

    # ID's front image
    infile = os.getenv('PICS_FOLDER') + '/' + pic_name + '.jpg'
    # image containing the zone where name is located
    text_area_img = os.getenv('ID_RESULTS_FOLDER') + '/name.jpg'

    img = Image.open(infile)
    # get dimensions to be used while cropping
    width, height = img.size

    # ocr_data contains all the text elements to search for along with their coordinates in the image
    for data in ocr_data:
        coords = data["coordinates"]
        print(coords[0] * width)
        print(coords[1] * height)
        print(coords[2] * width)
        print(coords[3] * height)
        cropping_coords = (coords[0] * width, coords[1] * height, coords[2] * width, coords[3] * height)
        # The readable area contains the text element
        readable_area = img.crop(cropping_coords)
        # The readable area is saved for later reference
        readable_area.save(text_area_img)
        try:
            txt = pytesseract.image_to_string(readable_area, lang='ara', config='--psm 6')
        except Exception as e:
            traceback.print_stack()
            txt = "An exception occurred: " + str(e)
        read_data[data["read_text"]] = txt
        with open(os.getenv('ID_RESULTS_FOLDER') + '/name.txt', 'w', encoding='utf-8') as f:
            print(txt, file=f)

    return read_data

Now, this code is working just fine on my Windows machine, but when I deployed it on CentOS 7 after having installed all the needed python libraries and installing python itself there, it’s giving

TypeError exception: An exception occurred  expected str  bytes or os.PathLike object  not NoneType

along with this stack trace(The same exception happens twice because there are two text elements I am searching for:

Traceback (most recent call last):
  File "/var/www/digital-identity/front.py", line 39, in read
    txt = pytesseract.image_to_string(readable_area, lang='ara', config='--psm 6')
  File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 345, in image_to_string
    }[output_type]()
  File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 344, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 253, in run_and_get_output
    run_tesseract(**kwargs)
  File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 223, in run_tesseract
    proc = subprocess.Popen(cmd_args, **subprocess_args())
  File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.6/subprocess.py", line 1278, in _execute_child
    executable = os.fsencode(executable)
  File "/usr/lib64/python3.6/os.py", line 800, in fsencode
    filename = fspath(filename)  # Does type-checking of `filename`.
TypeError: expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
  File "/var/www/digital-identity/front.py", line 39, in read
    txt = pytesseract.image_to_string(readable_area, lang='ara', config='--psm 6')
  File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 345, in image_to_string
    }[output_type]()
  File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 344, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 253, in run_and_get_output
    run_tesseract(**kwargs)
  File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 223, in run_tesseract
    proc = subprocess.Popen(cmd_args, **subprocess_args())
  File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.6/subprocess.py", line 1278, in _execute_child
    executable = os.fsencode(executable)
  File "/usr/lib64/python3.6/os.py", line 800, in fsencode
    filename = fspath(filename)  # Does type-checking of `filename`.
TypeError: expected str, bytes or os.PathLike object, not NoneType

Now I have checked that the files are being created, but I don’t know what’s causing the problem nor how am I supposed to solve it seeing as it’s probably an issue with the library.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
int3lcommented, Nov 15, 2019

‘pytesseract.pytesseract.tesseract_cmd’ is just a module variable. Thank you for sharing the workaround. I don’t know why this variable changed for your pytesseract installation - it should be set to ‘tesseract’ by default. Closing this issue as fixed.

0reactions
hke14commented, Nov 15, 2019

Alright, I checked something, pytesseract.pytesseract.tesseract_cmd was returning none, so I explicitly declared it to be tesseract as such: pytesseract.pytesseract.tesseract_cmd = 'tesseract' And it worked, so I guess this function just returned None for some reason but now it’s fixed

Read more comments on GitHub >

github_iconTop Results From Across the Web

TypeError: __call__() takes exactly 2 arguments (1 given) in ...
Not being an administrator on my lab machine, I tried to install the pip using curl like below in CentOS:
Read more >
Pillow (PIL Fork) Documentation
Note that for a single-band image, split() returns the image itself. To work with individual color bands, you may.
Read more >
python-pillow-Pillow/CHANGES.rst at main · alvistack/python-pillow ...
Do not prematurely return in ImageFile when saving to stdout #5665 ... radarhere]; Catch TypeError from corrupted DPI value in EXIF #5639 [homm,...
Read more >
python-Pillow-7.2.0-bp153.1.18 - SUSE Package Hub -
SEEK_* constants #3572 [jdufresne] * Make ContainerIO.isatty() return a bool, ... appveyor.yml as .appveyor.yml #2978 [hugovk] * Fix TypeError for JPEG2000 ...
Read more >
python3-Pillow-8.4.0-bp154.1.66.x86_64 RPM
SEEK_* constants #3572 [jdufresne] * Make ContainerIO.isatty() return a bool, ... opening webp files #2974 [wiredfool] * Setup: Fix "TypeError: 'NoneType' ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found