Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use new image for prediction

See original GitHub issue

Hi - Apologies for the avalanches of question posted, I have a read your paper https://arxiv.org/pdf/1912.13318.pdf and also studied the funsd, previously using it to train a fasterrcnn and predict Q and A’s. One thing I can not understand from the paper and this repo is, how do we ingest new images to predict? the paper says:

"To utilize the layout information of each document, we need toobtain the location of each token. However, the pre-training dataset(IIT-CDIP Test Collection) only contains pure texts while missing their corresponding bounding boxes. In this case, we re-process thescanned document images to obtain the necessary layout informa-tion. Like the original pre-processing in IIT-CDIP Test Collection,we similarly process the dataset by applying OCR to documentimages. The difference is that we obtain both the recognized wordsand their corresponding locations in the document image. Thanksto Tesseract6, an open-source OCR engine, we can easily obtain therecognition as well as the 2-D positions. We store the OCR results inhOCR format, a standard specification format which clearly definesthe OCR results of one single document image using a hierarchical representation"

However, although with tesseract I can get the words, bouning boxes and hierarchies, it does not provide an annotated docuement as input (as shown in repo example for test):

https://github.com/microsoft/unilm/pull/155

pred_dir = "predictions"

!python run_seq_labeling.py  --do_predict \
                            --data_dir data \
                            --model_type layoutlm \
                            --model_name_or_path {out_dir} \
                            --do_lower_case \
                            --output_dir predictions \
                            --labels data/labels.txt \
                            --fp16

How do we go from base input document in hOCR format of words, hierarchies and coordinates to the annotation format used in the example here? The --do_predict calls the test.txt from the data folder, which already has questions and answers next to the words.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:21 (1 by maintainers)

Top GitHub Comments

3reactions

kbrajwanicommented, Sep 17, 2020

I am using fusdn dataset. I have already fine-tune and evaluate on that. But now question is if new image is coming. 1 how will i make annotation file. Are you saying this can be achieved by the ocr. Then which ocr you are using. I have tried tesseract but sometimes it fails. 2. If i done prediction on one image the model will give me all questions and answers entity. so how can i map that which answer is for which questions.

1reaction

kbrajwanicommented, Sep 18, 2020

Thanks i understand.