question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use new image for prediction

See original GitHub issue

Hi - Apologies for the avalanches of question posted, I have a read your paper https://arxiv.org/pdf/1912.13318.pdf and also studied the funsd, previously using it to train a fasterrcnn and predict Q and A’s. One thing I can not understand from the paper and this repo is, how do we ingest new images to predict? the paper says:

"To utilize the layout information of each document, we need toobtain the location of each token. However, the pre-training dataset(IIT-CDIP Test Collection) only contains pure texts while missing their corresponding bounding boxes. In this case, we re-process thescanned document images to obtain the necessary layout informa-tion. Like the original pre-processing in IIT-CDIP Test Collection,we similarly process the dataset by applying OCR to documentimages. The difference is that we obtain both the recognized wordsand their corresponding locations in the document image. Thanksto Tesseract6, an open-source OCR engine, we can easily obtain therecognition as well as the 2-D positions. We store the OCR results inhOCR format, a standard specification format which clearly definesthe OCR results of one single document image using a hierarchical representation"

However, although with tesseract I can get the words, bouning boxes and hierarchies, it does not provide an annotated docuement as input (as shown in repo example for test):

https://github.com/microsoft/unilm/pull/155

pred_dir = "predictions"

!python run_seq_labeling.py  --do_predict \
                            --data_dir data \
                            --model_type layoutlm \
                            --model_name_or_path {out_dir} \
                            --do_lower_case \
                            --output_dir predictions \
                            --labels data/labels.txt \
                            --fp16

How do we go from base input document in hOCR format of words, hierarchies and coordinates to the annotation format used in the example here? The --do_predict calls the test.txt from the data folder, which already has questions and answers next to the words.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:21 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
kbrajwanicommented, Sep 17, 2020

I am using fusdn dataset. I have already fine-tune and evaluate on that. But now question is if new image is coming. 1 how will i make annotation file. Are you saying this can be achieved by the ocr. Then which ocr you are using. I have tried tesseract but sometimes it fails. 2. If i done prediction on one image the model will give me all questions and answers entity. so how can i map that which answer is for which questions.

1reaction
kbrajwanicommented, Sep 18, 2020

Thanks i understand.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to predict an image using CNN with Keras?
Resize the image ... Keep in mind that before feeding any image to Keras, we need to convert it to a standard format...
Read more >
How to predict input image using trained model in Keras?
You can use model.predict() to predict the class of a single image as follows [doc]: # load_model_sample.py from keras.models import ...
Read more >
Predicting New Image(s) - Kaggle
Predicting New Image (s) ... Method 1: Predicting Batch of Images. In [5]:. link code. # predicting images import pandas as pd test_datagen...
Read more >
Image classification and prediction using transfer learning
For the experiment, we will use the Fruits dataset and classify the image objects into 3 classes ( Apple, Banana, Orange ). ......
Read more >
Image Prediction Using a Pre-trained Model - Analytics Vidhya
The image size must correspond to the number of input nodes in the neural network when you feed it photos. Images we put...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found