Bug in Microsoft TROCR Large
See original GitHub issueEnvironment info
transformers
version: 4.12.2- Platform: Linux-5.11.0-1020-azure-x86_64-with-debian-bullseye-sid
- Python version: 3.7.11
- PyTorch version (GPU?): 1.10.0+cu102 (False)
- Tensorflow version (GPU?): 2.6.1 (False)
- Flax version (CPU?/GPU?/TPU?): 0.3.6 (cpu)
- Jax version: 0.2.24
- JaxLib version: 0.1.73
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: Parallel
Who can help
I myself solved this one. Edited a file called image_utils.py
which was calling the shapes wrongly.
Models:
- (Microsoft TROCR Large)[https://huggingface.co/microsoft/trocr-large-printed]
To reproduce
Steps to reproduce the behavior: Failed to run inference on TROCR :
Installation Steps :
Followed from https://github.com/microsoft/unilm/tree/master/trocr
conda create -n trocr python=3.7
conda activate trocr
git clone https://github.com/microsoft/unilm.git
cd unilm
cd trocr
pip install pybind11
pip install -r requirements.txt
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" 'git+https://github.com/NVIDIA/apex.git'
Also installed transformers from :
pip install transformers[all]
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))" # verified
Python Script to Invoke Inference :
From https://huggingface.co/microsoft/trocr-large-printed
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# load image from the IAM database (actually this model is meant to be used on printed text)
url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg'
image = Image.open(requests.get(url, stream=True).raw)
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-large-printed')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-large-printed')
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
Error Message Encountered (Inside the Library Source FIle) :
(trocr) hello@vm-Farhan-Ubuntu20:~/work/helloassets/DockerThings/SetupTROCR$ python simple_inference.py
Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-large-printed and are newly initialized: ['encoder.pooler.dense.weight', 'encoder.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "simple_inference.py", line 20, in <module>
pixel_values = processor(images=image, return_tensors="pt").pixel_values
File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/models/trocr/processing_trocr.py", line 117, in __call__
return self.current_processor(*args, **kwargs)
File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/models/vit/feature_extraction_vit.py", line 141, in __call__
images = [self.normalize(image=image, mean=self.image_mean, std=self.image_std) for image in images]
File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/models/vit/feature_extraction_vit.py", line 141, in <listcomp>
images = [self.normalize(image=image, mean=self.image_mean, std=self.image_std) for image in images]
File "/home/hello/work/anaconda3/envs/trocr/lib/python3.7/site-packages/transformers/image_utils.py", line 149, in normalize
return (image - mean) / std
ValueError: operands could not be broadcast together with shapes (384,384) (3,)
Expected behavior
Valid OCR Output.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
microsoft/trocr-large-handwritten - Hugging Face
Disclaimer: The team releasing TrOCR did not write a model card for this model so this model card has been written by the...
Read more >Jigsaw fixes bugs in machine-written software - Microsoft
Jigsaw fixes bugs in machine-written software ... In our research paper, Jigsaw: Large Language Models meet Program Synthesis, ...
Read more >Hugging Face Transformer Inference Under 1 Millisecond ...
Go to production with Microsoft and Nvidia open source tooling ... It's not a big deal because Hugging Face and model authors took...
Read more >Fine-tune TrOCR on the IAM Handwriting Database
In this notebook, we are going to fine-tune a pre-trained TrOCR model on ... processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
Read more >transformers · PyPI
TrOCR (from Microsoft), released together with the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Minghao Li, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi,
Thanks for spotting. The problem is that the image is grey-scale, meaning no color channels, and the normalize method defined in
image_utils.py
assumes 3 dimensions.You can fix it by making sure the image has 3 dimensions, i.e.
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
cc @sgugger
Yes, exactly!