Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Homemade evaluation script not working properly + Eval dataset not available

See original GitHub issue

Hi all,

I am very interested in your table detection model and wanted to check it by myself. I encountered different diffculties trying to do so and wanted to get some help.

1 - Eval dataset not available

I used an Azure VM to load the dataset and explore it. In your README.md, it is explicitly stated that the detection dataset is in PubTables-1M-Image_Page_Detection_PASCAL_VOC.tar.gz, and there should be 4 folders inside: images, train, test and val. However, when I opened the archive, there were only 2 folders: images and train, and three textfiles: train_filelist.txt, test_filelist.txt and val_filelist.txt containing the path to the XML annotation files.

test_filelist.txt and val_filelist.txt are clearly referencing files that are in /test/ and /val/, even if those folders don’t exist. I verified that the test and val annotations were not all in the train folder, and they are not.

I don’t know where to find the test and val annotations, you’ve probably changed the dataset since the readme was written, and it would be nice to update it.

2 - Homemade inference script not working

Because I didn’t have the eval dataset, I evaluated the detection model on some samples from the train dataset (I know, big warning because the model saw them during the training, but I just wanted to see good results, because I struggle to use the detection model).

Here is my code: First, I instanciate the model and load the weights (that I downloaded through the link in the README.md)

import os
import xml.etree.ElementTree as ET
from PIL import Image, ImageDraw

from torchvision import transforms
import torchvision.transforms.functional as F

import torch
from detr.models.position_encoding import PositionEmbeddingSine
from detr.models.detr import DETR
from detr.models.transformer import Transformer
from detr.models.backbone import Backbone, Joiner

position_embedding = PositionEmbeddingSine(128)
backbone = Backbone("resnet18", False, False, False)
backbone_model = Joiner(backbone, position_embedding)
backbone_model.num_channels = backbone.num_channels
backbone = backbone_model

transformer = Transformer(
    d_model=256,
    dropout=0.1,
    nhead=8,
    dim_feedforward=2048,
    num_encoder_layers=6,
    num_decoder_layers=6,
    normalize_before=True,
    return_intermediate_dec=True,
)

model = DETR(
    backbone,
    transformer,
    num_classes=2,
    num_queries=15,
    aux_loss=False,
)

weights = torch.load("~/Projects/table-parsing/models/pubtables1m_detection_detr_r18.pth", map_location=torch.device('cpu'))
model.load_state_dict(weights)

I consider this part successful because I am greeted by a <All keys matched successfully> message. If i would have instantiated the model incorrectly, I would have the usual Missing key(s) or Unexpected key(s) warnings from pytorch.

Secondly, I created a simple pipeline to reproduce the image preprocessing done in the repo:

convert_tensor = transforms.ToTensor()
mean = torch.tensor([0.485, 0.456, 0.406])
std = torch.tensor([0.229, 0.224, 0.225])
final_size = 800
max_size = 1333

def detr_pipeline(image):

    # Resizing image
    w, h = image.size
    min_original_size = float(min((w, h)))
    max_original_size = float(max((w, h)))
    if max_original_size / min_original_size * final_size > max_size:
        size = int(round(max_size * min_original_size / max_original_size))
    else:
        size = final_size

    if (w <= h and w == size) or (h <= w and h == size):
        new_h, new_w = h, w
    elif w < h:
        new_w = size
        new_h = int(size * h / w)
    else:
        new_h = size
        new_w = int(size * w / h)

    rescaled_image = F.resize(image, (new_h, new_w))
    image_tensor = convert_tensor(rescaled_image)

    # Normalizing image
    image_tensor = image_tensor - torch.broadcast_to(mean.unsqueeze(-1).unsqueeze(-1), image_tensor.shape)
    image_tensor = image_tensor / torch.broadcast_to(std.unsqueeze(-1).unsqueeze(-1), image_tensor.shape)

    # Inference
    output = model([image_tensor])
    return output

The hardcoded means and stds come from detr.datasets.coco.make_coco_transforms

Finally, I used this pipeline to evaluate 20 examples from the training set

dataset_path = "~/Data/PubTables1M-Detection-PASCAL-VOC"
annotation_folder = "train"

train_annotations = []
with open(os.path.join(dataset_path, "train_filelist.txt")) as file:
    for line in file:
        train_annotations.append(line[:-1])

found_examples = 0
current = 0

while found_examples < 20:
    ann = train_annotations[current]
    current += 1
    xml_path = os.path.join(dataset_path, ann)
    assert os.path.isfile(xml_path), 'Annotation not found'
    data = ET.parse(xml_path)
    root = data.getroot()
    image_path = os.path.join(dataset_path, "images", root[1].text)
    if not os.path.isfile(image_path):
        print(f"Skipping {root[1].text}, as file doesn't exist")
        continue
    else:
        print(image_path)
    found_examples += 1
    with Image.open(image_path) as im:
        outputs = detr_pipeline(im)
        bboxes, logits = outputs['pred_boxes'], outputs['pred_logits']
        probas_per_class = logits.softmax(-1)[:, :, :-1]
        objects_to_keep = probas_per_class.max(-1).values > 0.5
        pred_boxes = bboxes[objects_to_keep]

        draw = ImageDraw.Draw(im)
        for elem in root:
            if elem.tag == "object":
                x0, y0, xmax, ymax = [float(i.text) for i in elem.getchildren()[-1].getchildren()]
                draw.rectangle(
                    (x0, y0, xmax, ymax),
                    outline="blue",
                    width=3,
                )
        for box in pred_boxes:
                centre_x, centre_y, width, height = box
                x0 = int(im.size[0] * (centre_x - width / 2))
                y0 = int(im.size[1] * (centre_y - height / 2))
                x1 = int(im.size[0] * (centre_x + width / 2))
                y1 = int(im.size[1] * (centre_y + height / 2))
                draw.rectangle(
                    [x0, y0, x1, y1],
                    outline="red",
                    width=3
                )
        im.save(os.path.join("~/Desktop/output/table", root[1].text))

Note that here, I put a confidence threshold of 0.5, which is very low compared to some other DeTr model, where usually they consider a 0.9 confidence level. Hence I expect to have some false positive results. Also, I want to point out that there are many annotation files that reference an image that is not in the image folder (that’s why I used a while loop and not a for loop. But when I look at the results, none of them are correct, here are a few samples (the annotations are in blue and the predictions are in red):

PMC6062540_3 PMC6620314_8 PMC6589332_11

It is very weird, considering the model saw these samples during the training. I tried removing the preprocessing, but it doesn’t change the results very much, it still looks completely random. Could you please help me with this inference script? What am I doing wrong here?

Issue Analytics

State:
Created a year ago
Comments:8 (1 by maintainers)

Top GitHub Comments

1reaction

BenoitdeKersabieccommented, Oct 12, 2022

Hi @yellowjs0304 The inference code pretty much the concatenation of what’s above. for one specific image, if I flatten the code, you can do

import torch
from detr.models.position_encoding import PositionEmbeddingSine
from torchvision import transforms
import torchvision.transforms.functional as F
import os
from PIL import Image, ImageDraw
import json
from tqdm import tqdm
from detr.models.detr import DETR
from detr.models.transformer import Transformer
from detr.models.backbone import Backbone, Joiner
import numpy as np
import random


# ---- Instantiating the model ---- # 
position_embedding = PositionEmbeddingSine(128, normalize=True)
backbone = Backbone("resnet18", False, False, False)
backbone_model = Joiner(backbone, position_embedding)
backbone_model.num_channels = backbone.num_channels
backbone = backbone_model

transformer = Transformer(
    d_model=256,
    dropout=0.1,
    nhead=8,
    dim_feedforward=2048,
    num_encoder_layers=6,
    num_decoder_layers=6,
    normalize_before=True,
    return_intermediate_dec=True,
)

model = DETR(
    backbone,
    transformer,
    num_classes=2,
    num_queries=15,
    aux_loss=False,
)

weights = torch.load("./pubtables1m_detection_detr_r18.pth", map_location=torch.device('cpu')) # Change weight path here
model.load_state_dict(weights)
model.eval()


convert_tensor = transforms.ToTensor()
mean = torch.tensor([0.485, 0.456, 0.406])
std = torch.tensor([0.229, 0.224, 0.225])
final_size = 800
max_size = 1333

def detr_pipeline(image):
    image = Image.open(image)


    # Resizing image
    width, height = image.size
    current_max_size = max(width, height)
    scale = final_size / current_max_size
    resized_image = image.resize((int(round(scale * width)), int(round(scale * height))))
    image_tensor = convert_tensor(resized_image)

    # Normalizing image
    image_tensor = F.normalize(image_tensor, mean=mean, std=std)

    # Inference
    output = model([image_tensor])
    return output

outputs = detr_pipeline("image.jpg) # Change image path here
bboxes, logits = outputs['pred_boxes'], outputs['pred_logits']
prob_per_class = logits.softmax(-1)[:, :, :-1]
pred_confidence = prob_per_class.max(-1).values > 0.5
pred_boxes = bboxes[pred_confidence]

The last variable pred_boxes contains the normalized boxes coordinates for tables with a confidence > .5

Just modify the weight path and the image path and you should be good to go

1reaction

bsmockcommented, Aug 10, 2022

Hi,

For issue 1, I suggest trying to unzip again. This issue is a duplicate of #36.

For issue 2, I have not looked at your code but looking at the output it looks like you’re selecting and outputting the no-object class instead of the table and table-rotated classes. In DETR, the confidence score for the no-object/background class is output at the last class index.

However, another possibility is the image size. At inference time the image should be scaled to have a maximum length (width or height) of 800 to match what we do in our paper.

Hope these suggestions help you.

Best, Brandon

Top Results From Across the Web

How to get the evaluations · Issue #320 - GitHub

I already trained my own dataset that is in coco format, but it has different classes. If I understand correctly I need to...

Why should eval be avoided in Bash, and what should I use ...

Data validation is primordial if a script may handle untrusted data regardless where it's used (eval or not). Most issues I've seen related...

Information about evaluations - Help & FAQ | The Black List

Check out the following complete list of questions pertaining to the "Evaluations" category. But there were a lot of things in my script...

Easy Checks – A First Review of Web Accessibility - W3C

These checks cover just a few accessibility issues and are designed to be ... More robust assessment is needed to evaluate accessibility comprehensively....

pytorch evaluate model during training

In PyTorch, I want to evaluate my model on the validation set every ... If the evaluation function is called or not seems...