Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in Microsoft TROCR Large

See original GitHub issue

Environment info

transformers version: 4.12.2
Platform: Linux-5.11.0-1020-azure-x86_64-with-debian-bullseye-sid
Python version: 3.7.11
PyTorch version (GPU?): 1.10.0+cu102 (False)
Tensorflow version (GPU?): 2.6.1 (False)
Flax version (CPU?/GPU?/TPU?): 0.3.6 (cpu)
Jax version: 0.2.24
JaxLib version: 0.1.73
Using GPU in script?: No
Using distributed or parallel set-up in script?: Parallel

Who can help

I myself solved this one. Edited a file called image_utils.py which was calling the shapes wrongly.

Models:

(Microsoft TROCR Large)[https://huggingface.co/microsoft/trocr-large-printed]

To reproduce

Steps to reproduce the behavior: Failed to run inference on TROCR :

Installation Steps :

Followed from https://github.com/microsoft/unilm/tree/master/trocr

conda create -n trocr python=3.7
conda activate trocr
git clone https://github.com/microsoft/unilm.git
cd unilm
cd trocr
pip install pybind11
pip install -r requirements.txt
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" 'git+https://github.com/NVIDIA/apex.git'

Also installed transformers from :

pip install transformers[all]
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))" # verified

Python Script to Invoke Inference :

From https://huggingface.co/microsoft/trocr-large-printed

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# load image from the IAM database (actually this model is meant to be used on printed text)
url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg'
image = Image.open(requests.get(url, stream=True).raw)
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-large-printed')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-large-printed')
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

Error Message Encountered (Inside the Library Source FIle) :

(trocr) hello@vm-Farhan-Ubuntu20:~/work/helloassets/DockerThings/SetupTROCR$ python simple_inference.py 
Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-large-printed and are newly initialized: ['encoder.pooler.dense.weight', 'encoder.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "simple_inference.py", line 20, in <module>
    pixel_values = processor(images=image, return_tensors="pt").pixel_values
  File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/models/trocr/processing_trocr.py", line 117, in __call__
    return self.current_processor(*args, **kwargs)
  File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/models/vit/feature_extraction_vit.py", line 141, in __call__
    images = [self.normalize(image=image, mean=self.image_mean, std=self.image_std) for image in images]
  File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/models/vit/feature_extraction_vit.py", line 141, in <listcomp>
    images = [self.normalize(image=image, mean=self.image_mean, std=self.image_std) for image in images]
  File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/image_utils.py", line 149, in normalize
    return (image - mean) / std
ValueError: operands could not be broadcast together with shapes (384,384) (3,)

Expected behavior

Valid OCR Output.

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

NielsRoggecommented, Nov 2, 2021

Hi,

Thanks for spotting. The problem is that the image is grey-scale, meaning no color channels, and the normalize method defined in image_utils.py assumes 3 dimensions.

You can fix it by making sure the image has 3 dimensions, i.e.

image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

cc @sgugger

1reaction

NielsRoggecommented, Aug 12, 2022

in this case if we can only process RGB images, we check the number of channels is 3 and raise a value error otherwise?

Yes, exactly!

Top Results From Across the Web

microsoft/trocr-large-handwritten - Hugging Face

Disclaimer: The team releasing TrOCR did not write a model card for this model so this model card has been written by the...

Jigsaw fixes bugs in machine-written software - Microsoft

Jigsaw fixes bugs in machine-written software ... In our research paper, Jigsaw: Large Language Models meet Program Synthesis, ...

Hugging Face Transformer Inference Under 1 Millisecond ...

Go to production with Microsoft and Nvidia open source tooling ... It's not a big deal because Hugging Face and model authors took...

Fine-tune TrOCR on the IAM Handwriting Database

In this notebook, we are going to fine-tune a pre-trained TrOCR model on ... processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")

transformers · PyPI

TrOCR (from Microsoft), released together with the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Minghao Li, ...