question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in Microsoft TROCR Large

See original GitHub issue

Environment info

  • transformers version: 4.12.2
  • Platform: Linux-5.11.0-1020-azure-x86_64-with-debian-bullseye-sid
  • Python version: 3.7.11
  • PyTorch version (GPU?): 1.10.0+cu102 (False)
  • Tensorflow version (GPU?): 2.6.1 (False)
  • Flax version (CPU?/GPU?/TPU?): 0.3.6 (cpu)
  • Jax version: 0.2.24
  • JaxLib version: 0.1.73
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: Parallel

Who can help

I myself solved this one. Edited a file called image_utils.py which was calling the shapes wrongly.

Models:

To reproduce

Steps to reproduce the behavior: Failed to run inference on TROCR :

Installation Steps :

Followed from https://github.com/microsoft/unilm/tree/master/trocr

conda create -n trocr python=3.7
conda activate trocr
git clone https://github.com/microsoft/unilm.git
cd unilm
cd trocr
pip install pybind11
pip install -r requirements.txt
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" 'git+https://github.com/NVIDIA/apex.git'

Also installed transformers from :

pip install transformers[all]
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))" # verified

Python Script to Invoke Inference :

From https://huggingface.co/microsoft/trocr-large-printed

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# load image from the IAM database (actually this model is meant to be used on printed text)
url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg'
image = Image.open(requests.get(url, stream=True).raw)
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-large-printed')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-large-printed')
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

Error Message Encountered (Inside the Library Source FIle) :

(trocr) hello@vm-Farhan-Ubuntu20:~/work/helloassets/DockerThings/SetupTROCR$ python simple_inference.py 
Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-large-printed and are newly initialized: ['encoder.pooler.dense.weight', 'encoder.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "simple_inference.py", line 20, in <module>
    pixel_values = processor(images=image, return_tensors="pt").pixel_values
  File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/models/trocr/processing_trocr.py", line 117, in __call__
    return self.current_processor(*args, **kwargs)
  File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/models/vit/feature_extraction_vit.py", line 141, in __call__
    images = [self.normalize(image=image, mean=self.image_mean, std=self.image_std) for image in images]
  File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/models/vit/feature_extraction_vit.py", line 141, in <listcomp>
    images = [self.normalize(image=image, mean=self.image_mean, std=self.image_std) for image in images]
  File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/image_utils.py", line 149, in normalize
    return (image - mean) / std
ValueError: operands could not be broadcast together with shapes (384,384) (3,)

Expected behavior

Valid OCR Output.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
NielsRoggecommented, Nov 2, 2021

Hi,

Thanks for spotting. The problem is that the image is grey-scale, meaning no color channels, and the normalize method defined in image_utils.py assumes 3 dimensions.

You can fix it by making sure the image has 3 dimensions, i.e.

image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

cc @sgugger

1reaction
NielsRoggecommented, Aug 12, 2022

in this case if we can only process RGB images, we check the number of channels is 3 and raise a value error otherwise?

Yes, exactly!

Read more comments on GitHub >

github_iconTop Results From Across the Web

microsoft/trocr-large-handwritten - Hugging Face
Disclaimer: The team releasing TrOCR did not write a model card for this model so this model card has been written by the...
Read more >
Jigsaw fixes bugs in machine-written software - Microsoft
Jigsaw fixes bugs in machine-written software ... In our research paper, Jigsaw: Large Language Models meet Program Synthesis, ...
Read more >
Hugging Face Transformer Inference Under 1 Millisecond ...
Go to production with Microsoft and Nvidia open source tooling ... It's not a big deal because Hugging Face and model authors took...
Read more >
Fine-tune TrOCR on the IAM Handwriting Database
In this notebook, we are going to fine-tune a pre-trained TrOCR model on ... processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
Read more >
transformers · PyPI
TrOCR (from Microsoft), released together with the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Minghao Li, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found