Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError Missing/Unexpected key(s) (Training from scratch on Custom Dataset)

See original GitHub issue

Instructions To Reproduce the Issue:

1. what changes you made:

added a new dataset file ‘detr/datasets/custom.py’ based on ‘detr/datasets/coco.py’, almost the same.
changed number of classes to 2, and accordingly I adjusted ‘detr/models/detr.py’ to have the right number of classes. Also, in same file, I replaced (according to fmassa post in this issue): this: losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0] with this: losses['class_error'] = 100 - accuracy(src_logits[idx][..., :-1], target_classes_o)[0]

Environment:

PyTorch version: 1.8.1+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 16.04.6 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Clang version: Could not collect
CMake version: version 3.5.1

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 7.5.17
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 2080 Ti
GPU 1: NVIDIA GeForce GTX 980 Ti
GPU 2: NVIDIA GeForce RTX 2080 Ti

Nvidia driver version: 465.19.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.3
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.8.1+cu111
[pip3] torchaudio==0.8.1
[pip3] torchvision==0.9.1+cu111
[conda] Could not collect

2. what exact command you run:

for training (from scratch and using the default backbone resnet50): CUDA_VISIBLE_DEVICES=1 python main.py --data_path custom_dataset/ --output_dir custom_dataset/output/ --dataset_file custom --gpu 1 --num_classes 2

for inference (most of the code in inference.py taken from this notebook): python inference.py --checkpoint custom_dataset/output/checkpoint.pth --labels custom_dataset/label.txt --image_dir custom_dataset/test_images/

inference.py full code:

rom PIL import Image
import requests
import matplotlib.pyplot as plt
import os
import sys
import torch
from torch import nn
from torchvision.models import resnet50
import torchvision.transforms as T
import argparse
import tkinter
import matplotlib
import glob
import torch.nn.functional as F
matplotlib.use('TkAgg')
torch.set_grad_enabled(False)

# colors for visualization
COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],
          [0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]


parser = argparse.ArgumentParser()
parser.add_argument("--checkpoint", help="path to checkpoint", required=True)
parser.add_argument("--labels", help="file contains the labels on each line", required=True)
parser.add_argument("--image-dir", help="path to images dir", required=True)
args = parser.parse_args()


class DETRdemo(nn.Module):
    """
    Demo DETR implementation.

    Demo implementation of DETR in minimal number of lines, with the
    following differences wrt DETR in the paper:
    * learned positional encoding (instead of sine)
    * positional encoding is passed at input (instead of attention)
    * fc bbox predictor (instead of MLP)
    The model achieves ~40 AP on COCO val5k and runs at ~28 FPS on Tesla V100.
    Only batch size 1 supported.
    """
    def __init__(self, num_classes, hidden_dim=256, nheads=8,
                 num_encoder_layers=6, num_decoder_layers=6):
        super().__init__()

        # create ResNet-50 backbone
        self.backbone = resnet50()
        del self.backbone.fc

        # create conversion layer
        self.conv = nn.Conv2d(2048, hidden_dim, 1)

        # create a default PyTorch transformer
        self.transformer = nn.Transformer(
            hidden_dim, nheads, num_encoder_layers, num_decoder_layers)

        # prediction heads, one extra class for predicting non-empty slots
        # note that in baseline DETR linear_bbox layer is 3-layer MLP
        self.linear_class = nn.Linear(hidden_dim, num_classes + 1)
        self.linear_bbox = nn.Linear(hidden_dim, 4)

        # output positional encodings (object queries)
        self.query_pos = nn.Parameter(torch.rand(100, hidden_dim))

        # spatial positional encodings
        # note that in baseline DETR we use sine positional encodings
        self.row_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
        self.col_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))

    def forward(self, inputs):
        # propagate inputs through ResNet-50 up to avg-pool layer
        x = self.backbone.conv1(inputs)
        x = self.backbone.bn1(x)
        x = self.backbone.relu(x)
        x = self.backbone.maxpool(x)

        x = self.backbone.layer1(x)
        x = self.backbone.layer2(x)
        x = self.backbone.layer3(x)
        x = self.backbone.layer4(x)

        # convert from 2048 to 256 feature planes for the transformer
        h = self.conv(x)

        # construct positional encodings
        H, W = h.shape[-2:]
        pos = torch.cat([
            self.col_embed[:W].unsqueeze(0).repeat(H, 1, 1),
            self.row_embed[:H].unsqueeze(1).repeat(1, W, 1),
        ], dim=-1).flatten(0, 1).unsqueeze(1)

        # propagate through the transformer
        h = self.transformer(pos + 0.1 * h.flatten(2).permute(2, 0, 1),
                             self.query_pos.unsqueeze(1)).transpose(0, 1)

        # finally project transformer outputs to class labels and bounding boxes
        return {'pred_logits': self.linear_class(h),
                'pred_boxes': self.linear_bbox(h).sigmoid()}


# for output bounding box post-processing
def box_cxcywh_to_xyxy(x):
    x_c, y_c, w, h = x.unbind(1)
    b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
         (x_c + 0.5 * w), (y_c + 0.5 * h)]
    return torch.stack(b, dim=1)


def rescale_bboxes(out_bbox, size):
    img_w, img_h = size
    b = box_cxcywh_to_xyxy(out_bbox)
    b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32)
    return b


def detect(im, model, transform):
    # mean-std normalize the input image (batch-size: 1)
    img = transform(im).unsqueeze(0)

    # demo model only support by default images with aspect ratio between 0.5 and 2
    # if you want to use images with an aspect ratio outside this range
    # rescale your image so that the maximum size is at most 1333 for best results
    assert img.shape[-2] <= 1600 and img.shape[-1] <= 1600, 'demo model only supports images up to 1600 pixels on each side'

    # propagate through the model
    outputs = model(img)

    # keep only predictions with 0.7+ confidence
    probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
    keep = probas.max(-1).values > 0.0

    # convert boxes from [0; 1] to image scales
    bboxes_scaled = rescale_bboxes(outputs['pred_boxes'][0, keep], im.size)
    return probas[keep], bboxes_scaled


def plot_results(pil_img, prob, boxes, classes):
    plt.figure(figsize=(16,10))
    plt.imshow(pil_img)
    ax = plt.gca()
    for p, (xmin, ymin, xmax, ymax), c in zip(prob, boxes.tolist(), COLORS * 100):
        ax.add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
                                   fill=False, color=c, linewidth=3))
        cl = p.argmax()
        text = f'{classes[cl]}: {p[cl]:0.2f}'
        ax.text(xmin, ymin, text, fontsize=15,
                bbox=dict(facecolor='yellow', alpha=0.5))
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    # standard PyTorch mean-std input image normalization
    transform = T.Compose([
        T.Resize(800),
        T.ToTensor(),
        T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

    with open(args.labels) as f:
        classes = f.read().splitlines()

    for im in glob.glob(os.path.join(os.path.abspath(args.image_dir), '*jpg'), recursive=True):
        detr = DETRdemo(num_classes=len(classes))
        state_dict = torch.load(args.checkpoint, map_location='cpu')
        state_dict["model"] = {k.replace(".0.body", ""): v for k,v in state_dict["model"].items()}
        detr.load_state_dict(state_dict['model'], strict=True)
        detr.eval()
        scores, boxes = None, None
        im = Image.open(im).convert('RGB')

        scores, boxes = detect(im, detr, transform)
        plot_results(im, scores, boxes, classes)


# url
#state_dict = torch.hub.load_state_dict_from_url(
#            url='https://dl.fbaipublicfiles.com/detr/detr_demo-da2a99e9.pth',
#                map_location='cpu', check_hash=True
#        )
#detr.load_state_dict(state_dict)

3. what you observed (including full logs):

Traceback (most recent call last):
  File "inference.py", line 343, in <module>
    detr.load_state_dict(state_dict['model'], strict=True)
  File "/home/tensorflow/venvs/detr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DETRdemo:
	Missing key(s) in state_dict: "query_pos", "row_embed", "col_embed", "conv.weight", "conv.bias", "transformer.encoder.norm.weight", "transformer.encoder.norm.bias", "linear_class.weight", "linear_class.bias", "linear_bbox.weight", "linear_bbox.bias". 
	Unexpected key(s) in state_dict: "class_embed.weight", "class_embed.bias", "bbox_embed.layers.0.weight", "bbox_embed.layers.0.bias", "bbox_embed.layers.1.weight", "bbox_embed.layers.1.bias", "bbox_embed.layers.2.weight", "bbox_embed.layers.2.bias", "query_embed.weight", "input_proj.weight", "input_proj.bias".

Before that, many keys were different, so I tried to compare the keys names between my model and the model provided in the notebook, since the latter worked just fine, so I renamed some by removing a weird string ‘0.body’ from them to make them look similar (got help from here), and after this step I got that error message regarding the remaining keys.

I am trying right now similarly to solve it and rename some of those remaining keys by trying to find a relation between them, (e.g their shape), something like this: input_proj.bias rename to conv.bias input_proj.weight rename to conv.weight query_embed.weight rename to query_pos class_embed.weight rename to linear_class.weight class_embed.bias rename to linear_class.bias bbox_embed.layers.2.weight rename to linear_bbox.weight bbox_embed.layers.2.weight.bias rename to linear_bbox.bias

but still, some other keys (printed them below) I did not know honestly how to rename them. In general, am not sure that renaming them is the solution!

## my model unique keys
my_state_dict >>> bbox_embed.layers.0.weight	shape: torch.Size([256, 256])
my_state_dict >>> bbox_embed.layers.0.bias	shape: torch.Size([256])
my_state_dict >>> bbox_embed.layers.1.weight	shape: torch.Size([256, 256])
my_state_dict >>> bbox_embed.layers.1.bias	shape: torch.Size([256])

## the notebook model unique keys
notebook_state_dict >>> transformer.encoder.norm.weight	shape: torch.Size([256])
notebook_state_dict >>> transformer.encoder.norm.bias	shape: torch.Size([256])
notebook_state_dict >>> row_embed	shape: torch.Size([50, 128])
notebook_state_dict >>> col_embed	shape: torch.Size([50, 128])

And passing strict=False when load_state_dict let the code run without errors but the detection boxes were weird, like running inference on same image multiply times gives me only one box with different coordinates and different score each time, I am not sure what’s causing this! (example attached) Screenshot from 2021-05-18 12-43-51

Expected behavior:

I am expecting to run the inference code without keys errors and/or detect objects correctly on images.

Now I am really out of ideas, so I will be very glad for any help please (: Thank you in advance!

Issue Analytics

State:
Created 2 years ago
Comments:6

Top GitHub Comments

1reaction

mhbasselcommented, Sep 13, 2021

Hi @ducvuuit, I am not sure what could be the reason. What is the plot on the left? is it the training loss? if yes, maybe it is an overfitting problem? As far as I remember I got similar mAP first when I started training, but then when I added those arguments “–position_embedding learned --pre_norm” (as I mentioned in this issue), I got the above plot I posted.

Anyway I just started a new training on a bigger dataset (~100k images) with 22 classes and without resuming from a checkpoint, I will report the result here if it worked.

0reactions

mhbasselcommented, Sep 14, 2021

Sure (: my apology for not pointing out that I made some changes to the code, especially in models/detr.py and not just adding those args, I just remembered that after you asked for the code, I will tell you all changes I made here since I can’t share stuff from our company server.

In file main.py I added two new arguments:

parser.add_argument('--num_classes', type=int, default=91, help="Number of classes in your dataset. Overridden by coco and coco_panoptic datasets") parser.add_argument('--data_path', type=str)

In directory dataset, I made a copy of coco.py and named it custom.py and inside it I changed the names accordingly:

class CocoDetection --> class CustomDetection class ConvertCocoPolysToMask --> class ConvertCustomPolysToMask def make_coco_transforms --> def make_custom_transforms etc…

and in def build() I edited it like this: (The variable PATHS is customized, you don’t need to edit it if your file tree is correct)

def build(image_set, args):
    root = Path(args.data_path)
    assert root.exists(), f'provided data path {root} does not exist'
    mode = 'instances'
    PATHS = {
        "train": (root / "train", root / "annotations" / 'train.json'),
        "val": (root / "val", root / "annotations" / 'val.json'),
    }
    img_folder, ann_file = PATHS[image_set]
    dataset = CustomDetection(img_folder, ann_file, transforms=make_custom_transforms(image_set), return_masks=args.masks)
    return dataset

in file models/detr.py:

self.bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3) --> self.bbox_embed = nn.Linear(hidden_dim, 4) self.query_embed = nn.Embedding(num_queries, hidden_dim) --> self.query_embed = nn.Parameter(torch.rand(100, hidden_dim))

in def forward():

hs = self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0] –> hs = self.transformer(self.input_proj(src), mask, self.query_embed, pos[-1])[0] losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0] --> losses['class_error'] = 100 - accuracy(src_logits[idx][..., :-1], target_classes_o)[0] # according to https://github.com/facebookresearch/detr/issues/41

in def loss_cardinality():

num_classes = 20 if args.dataset_file != 'coco' else 91 --> num_classes = args.num_classes

The command I run: CUDA_VISIBLE_DEVICES=1 python main.py --data_path custom_dataset/ --output_dir custom_dataset/output/ --dataset_file custom --num_classes 22 --position_embedding learned --pre_norm

That’s all. For the training I ran yesterday it is still at epoch 3, so maybe I will report the result after one month lol. Btw I am using NVIDIA GeForce RTX 2080 Ti 11018MiB.

For inference I have a script mostly based on the one provided in the notebook that I can share it as well, but I am not sure if it works correctly, I remember that day when I closed this issue that I was still getting same label name for all objects, anyway I need to check it again.

log.txt

Top Results From Across the Web

Missing keys & unexpected keys in state_dict when loading ...

I have a labeled image dataset in a considerable large scale and I chose ... RuntimeError: Error(s) in loading state_dict for VGG: Missing ......

How to Train YOLOv6 on a Custom Dataset - Roboflow Blog

Let's dive in to how to train YOLOv6 on a custom dataset. The custom dataset we'll be using for this post is Chess...

How to Train YOLO v5 on a Custom Dataset - YouTube

Subscribe: https://bit.ly/rf-yt-subYOLOv5 is the latest evolution in the YOLO family of object detection models. It's the first YOLO ...

Transfer learning using pre trained objective detection model ...

Second step Train · Classification: Input-256 and Output-91(0:Background + 90:Classes) TO BE TRAINED · Center-ness: Input-256 and Ouput-1 ...

How to Train YOLO v5 on a Custom Dataset - Paperspace Blog

Download the dataset. wget -O RoadSignDetectionDataset.zip https://arcraftimages.s3-accelerate.amazonaws.com/Datasets/RoadSigns/RoadSignsPascalVOC ...