How to use new image for prediction
See original GitHub issueHi - Apologies for the avalanches of question posted, I have a read your paper https://arxiv.org/pdf/1912.13318.pdf and also studied the funsd, previously using it to train a fasterrcnn and predict Q and A’s. One thing I can not understand from the paper and this repo is, how do we ingest new images to predict? the paper says:
"To utilize the layout information of each document, we need toobtain the location of each token. However, the pre-training dataset(IIT-CDIP Test Collection) only contains pure texts while missing their corresponding bounding boxes. In this case, we re-process thescanned document images to obtain the necessary layout informa-tion. Like the original pre-processing in IIT-CDIP Test Collection,we similarly process the dataset by applying OCR to documentimages. The difference is that we obtain both the recognized wordsand their corresponding locations in the document image. Thanksto Tesseract6, an open-source OCR engine, we can easily obtain therecognition as well as the 2-D positions. We store the OCR results inhOCR format, a standard specification format which clearly definesthe OCR results of one single document image using a hierarchical representation"
However, although with tesseract I can get the words, bouning boxes and hierarchies, it does not provide an annotated docuement as input (as shown in repo example for test):
https://github.com/microsoft/unilm/pull/155
pred_dir = "predictions"
!python run_seq_labeling.py --do_predict \
--data_dir data \
--model_type layoutlm \
--model_name_or_path {out_dir} \
--do_lower_case \
--output_dir predictions \
--labels data/labels.txt \
--fp16
How do we go from base input document in hOCR format of words, hierarchies and coordinates to the annotation format used in the example here? The --do_predict calls the test.txt from the data folder, which already has questions and answers next to the words.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:21 (1 by maintainers)
Top GitHub Comments
I am using fusdn dataset. I have already fine-tune and evaluate on that. But now question is if new image is coming. 1 how will i make annotation file. Are you saying this can be achieved by the ocr. Then which ocr you are using. I have tried tesseract but sometimes it fails. 2. If i done prediction on one image the model will give me all questions and answers entity. so how can i map that which answer is for which questions.
Thanks i understand.