question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add bounding boxes coordinates in predictions

See original GitHub issue

It could be useful to get bounding boxes coordinates from Document Information Extraction task predictions.

on conventional pipeline : Screenshot from 2022-09-05 06-33-35

on Donut it could be something like:

{
    'predictions': [{
        'menu': [{
                'cnt': '2',
                'nm': 'ICE BLAOKCOFFE',
                'price': '82,000',
                'bbox': [xmin, ymin, xmax, ymax]
            },
            {
                'cnt': '1',
                'nm': 'AVOCADO COFFEE',
                'price': '61,000',
                'bbox': [xmin, ymin, xmax, ymax]
            },
        ],
        'total': {
            'cashprice': '200,000',
            'changeprice': '25,400',
            'total_price': '174,600',
            'bbox': [xmin, ymin, xmax, ymax]
        }
    }]
}

possible solution (I did not succeed): https://github.com/clovaai/donut/issues/16#issuecomment-1217464215

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:6
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

11reactions
SamSamhunscommented, Sep 9, 2022

Screen Shot 2022-09-09 at 10 55 53 AM

So, I’ve found a way to generate the heatmaps from the cross attentions from the decoder. However, the attention maps correspond to each output token from the decoder and not necessarily a word i.e. the word Restaurant might consist of three tokens (Res + tau + rant) and the attention-heatmaps are very coarse and might not give precise boxes as shown in the example.

Additionally, you need to get the correspondence between the token values and the token indices and have to snoop in the transformers library Bart batch decode implementations for that.

In the example above, I’ve fused the attention heads, the layer heads, and the different token heatmaps with max fusion. And run a threshold on the attention areas, contour them and save the bounding-box with the largest area. Maybe someone can find a way to generate better heat maps.

I’ll attach the link to the notebook I used to generate the maps. If people are interested in the code to get the token indexes to token values mapping, I can attach a modified donut/model.py as well.

https://colab.research.google.com/drive/1OzRapy23W8Ksf0AtqlkLFaVAAjJRUqbk?usp=sharing

1reaction
SamSamhunscommented, Sep 11, 2022

Refer to the Document VQA Example section from this notebook. You have to use a resized shape of [4, 16, 80, 60] for docvqa task since the final cross-attention feature map sizes differ from the document extraction task.

https://colab.research.google.com/drive/1OzRapy23W8Ksf0AtqlkLFaVAAjJRUqbk?usp=sharing

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bounding Box Prediction from Scratch using PyTorch
Create a dictionary consisting of filepath , width , height , the bounding box coordinates ( xmin , xmax , ymin , ymax...
Read more >
How to get class and bounding box coordinates from YOLOv5 ...
I have written my own python script but I cannot access the predicted class and the bounding box coordinates from the output of...
Read more >
Detection algorithms - Bounding Box Predictions - UPSCFEVER
In this section, let's see how you can get your bounding box predictions to be more ... And that it outputs the bounding...
Read more >
Bounding boxes augmentation for object detection
The bounding box has the following (x, y) coordinates of its corners: ... You can pass labels along with bounding boxes coordinates by...
Read more >
How to display Vision bounding boxes - Machine, Think!
Vision outputs normalized coordinates · The coordinates are normalized. · The origin (0,0) is in the lower-left corner! · The predictions are ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found