Homemade evaluation script not working properly + Eval dataset not available
See original GitHub issueHi all,
I am very interested in your table detection model and wanted to check it by myself. I encountered different diffculties trying to do so and wanted to get some help.
1 - Eval dataset not available
I used an Azure VM to load the dataset and explore it. In your README.md
, it is explicitly stated that the detection dataset is in PubTables-1M-Image_Page_Detection_PASCAL_VOC.tar.gz, and there should be 4 folders inside: images
, train
, test
and val
. However, when I opened the archive, there were only 2 folders: images
and train
, and three textfiles: train_filelist.txt
, test_filelist.txt
and val_filelist.txt
containing the path to the XML annotation files.
test_filelist.txt
and val_filelist.txt
are clearly referencing files that are in /test/
and /val/
, even if those folders don’t exist. I verified that the test
and val
annotations were not all in the train
folder, and they are not.
I don’t know where to find the test
and val
annotations, you’ve probably changed the dataset since the readme was written, and it would be nice to update it.
2 - Homemade inference script not working
Because I didn’t have the eval dataset, I evaluated the detection model on some samples from the train
dataset (I know, big warning because the model saw them during the training, but I just wanted to see good results, because I struggle to use the detection model).
Here is my code: First, I instanciate the model and load the weights (that I downloaded through the link in the README.md)
import os
import xml.etree.ElementTree as ET
from PIL import Image, ImageDraw
from torchvision import transforms
import torchvision.transforms.functional as F
import torch
from detr.models.position_encoding import PositionEmbeddingSine
from detr.models.detr import DETR
from detr.models.transformer import Transformer
from detr.models.backbone import Backbone, Joiner
position_embedding = PositionEmbeddingSine(128)
backbone = Backbone("resnet18", False, False, False)
backbone_model = Joiner(backbone, position_embedding)
backbone_model.num_channels = backbone.num_channels
backbone = backbone_model
transformer = Transformer(
d_model=256,
dropout=0.1,
nhead=8,
dim_feedforward=2048,
num_encoder_layers=6,
num_decoder_layers=6,
normalize_before=True,
return_intermediate_dec=True,
)
model = DETR(
backbone,
transformer,
num_classes=2,
num_queries=15,
aux_loss=False,
)
weights = torch.load("~/Projects/table-parsing/models/pubtables1m_detection_detr_r18.pth", map_location=torch.device('cpu'))
model.load_state_dict(weights)
I consider this part successful because I am greeted by a <All keys matched successfully>
message. If i would have instantiated the model incorrectly, I would have the usual Missing key(s)
or Unexpected key(s)
warnings from pytorch.
Secondly, I created a simple pipeline to reproduce the image preprocessing done in the repo:
convert_tensor = transforms.ToTensor()
mean = torch.tensor([0.485, 0.456, 0.406])
std = torch.tensor([0.229, 0.224, 0.225])
final_size = 800
max_size = 1333
def detr_pipeline(image):
# Resizing image
w, h = image.size
min_original_size = float(min((w, h)))
max_original_size = float(max((w, h)))
if max_original_size / min_original_size * final_size > max_size:
size = int(round(max_size * min_original_size / max_original_size))
else:
size = final_size
if (w <= h and w == size) or (h <= w and h == size):
new_h, new_w = h, w
elif w < h:
new_w = size
new_h = int(size * h / w)
else:
new_h = size
new_w = int(size * w / h)
rescaled_image = F.resize(image, (new_h, new_w))
image_tensor = convert_tensor(rescaled_image)
# Normalizing image
image_tensor = image_tensor - torch.broadcast_to(mean.unsqueeze(-1).unsqueeze(-1), image_tensor.shape)
image_tensor = image_tensor / torch.broadcast_to(std.unsqueeze(-1).unsqueeze(-1), image_tensor.shape)
# Inference
output = model([image_tensor])
return output
The hardcoded means and stds come from detr.datasets.coco.make_coco_transforms
Finally, I used this pipeline to evaluate 20 examples from the training set
dataset_path = "~/Data/PubTables1M-Detection-PASCAL-VOC"
annotation_folder = "train"
train_annotations = []
with open(os.path.join(dataset_path, "train_filelist.txt")) as file:
for line in file:
train_annotations.append(line[:-1])
found_examples = 0
current = 0
while found_examples < 20:
ann = train_annotations[current]
current += 1
xml_path = os.path.join(dataset_path, ann)
assert os.path.isfile(xml_path), 'Annotation not found'
data = ET.parse(xml_path)
root = data.getroot()
image_path = os.path.join(dataset_path, "images", root[1].text)
if not os.path.isfile(image_path):
print(f"Skipping {root[1].text}, as file doesn't exist")
continue
else:
print(image_path)
found_examples += 1
with Image.open(image_path) as im:
outputs = detr_pipeline(im)
bboxes, logits = outputs['pred_boxes'], outputs['pred_logits']
probas_per_class = logits.softmax(-1)[:, :, :-1]
objects_to_keep = probas_per_class.max(-1).values > 0.5
pred_boxes = bboxes[objects_to_keep]
draw = ImageDraw.Draw(im)
for elem in root:
if elem.tag == "object":
x0, y0, xmax, ymax = [float(i.text) for i in elem.getchildren()[-1].getchildren()]
draw.rectangle(
(x0, y0, xmax, ymax),
outline="blue",
width=3,
)
for box in pred_boxes:
centre_x, centre_y, width, height = box
x0 = int(im.size[0] * (centre_x - width / 2))
y0 = int(im.size[1] * (centre_y - height / 2))
x1 = int(im.size[0] * (centre_x + width / 2))
y1 = int(im.size[1] * (centre_y + height / 2))
draw.rectangle(
[x0, y0, x1, y1],
outline="red",
width=3
)
im.save(os.path.join("~/Desktop/output/table", root[1].text))
Note that here, I put a confidence threshold of 0.5, which is very low compared to some other DeTr model, where usually they consider a 0.9 confidence level. Hence I expect to have some false positive results.
Also, I want to point out that there are many annotation files that reference an image that is not in the image
folder (that’s why I used a while
loop and not a for
loop.
But when I look at the results, none of them are correct, here are a few samples (the annotations are in blue and the predictions are in red):
It is very weird, considering the model saw these samples during the training. I tried removing the preprocessing, but it doesn’t change the results very much, it still looks completely random. Could you please help me with this inference script? What am I doing wrong here?
Issue Analytics
- State:
- Created a year ago
- Comments:8 (1 by maintainers)
Top GitHub Comments
Hi @yellowjs0304 The inference code pretty much the concatenation of what’s above. for one specific image, if I flatten the code, you can do
The last variable
pred_boxes
contains the normalized boxes coordinates for tables with a confidence > .5Just modify the weight path and the image path and you should be good to go
Hi,
For issue 1, I suggest trying to unzip again. This issue is a duplicate of #36.
For issue 2, I have not looked at your code but looking at the output it looks like you’re selecting and outputting the no-object class instead of the table and table-rotated classes. In DETR, the confidence score for the no-object/background class is output at the last class index.
However, another possibility is the image size. At inference time the image should be scaled to have a maximum length (width or height) of 800 to match what we do in our paper.
Hope these suggestions help you.
Best, Brandon