RuntimeError Missing/Unexpected key(s) (Training from scratch on Custom Dataset)
See original GitHub issueInstructions To Reproduce the Issue:
1. what changes you made:
- added a new dataset file ‘detr/datasets/custom.py’ based on ‘detr/datasets/coco.py’, almost the same.
- changed number of classes to 2, and accordingly I adjusted ‘detr/models/detr.py’ to have the right number of classes. Also, in same file, I replaced (according to fmassa post in this issue):
this:
losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]
with this:losses['class_error'] = 100 - accuracy(src_logits[idx][..., :-1], target_classes_o)[0]
Environment:
PyTorch version: 1.8.1+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 16.04.6 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Clang version: Could not collect
CMake version: version 3.5.1
Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 7.5.17
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 2080 Ti
GPU 1: NVIDIA GeForce GTX 980 Ti
GPU 2: NVIDIA GeForce RTX 2080 Ti
Nvidia driver version: 465.19.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.3
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.8.1+cu111
[pip3] torchaudio==0.8.1
[pip3] torchvision==0.9.1+cu111
[conda] Could not collect
2. what exact command you run:
for training (from scratch and using the default backbone resnet50):
CUDA_VISIBLE_DEVICES=1 python main.py --data_path custom_dataset/ --output_dir custom_dataset/output/ --dataset_file custom --gpu 1 --num_classes 2
for inference (most of the code in inference.py taken from this notebook):
python inference.py --checkpoint custom_dataset/output/checkpoint.pth --labels custom_dataset/label.txt --image_dir custom_dataset/test_images/
inference.py full code:
rom PIL import Image
import requests
import matplotlib.pyplot as plt
import os
import sys
import torch
from torch import nn
from torchvision.models import resnet50
import torchvision.transforms as T
import argparse
import tkinter
import matplotlib
import glob
import torch.nn.functional as F
matplotlib.use('TkAgg')
torch.set_grad_enabled(False)
# colors for visualization
COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],
[0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]
parser = argparse.ArgumentParser()
parser.add_argument("--checkpoint", help="path to checkpoint", required=True)
parser.add_argument("--labels", help="file contains the labels on each line", required=True)
parser.add_argument("--image-dir", help="path to images dir", required=True)
args = parser.parse_args()
class DETRdemo(nn.Module):
"""
Demo DETR implementation.
Demo implementation of DETR in minimal number of lines, with the
following differences wrt DETR in the paper:
* learned positional encoding (instead of sine)
* positional encoding is passed at input (instead of attention)
* fc bbox predictor (instead of MLP)
The model achieves ~40 AP on COCO val5k and runs at ~28 FPS on Tesla V100.
Only batch size 1 supported.
"""
def __init__(self, num_classes, hidden_dim=256, nheads=8,
num_encoder_layers=6, num_decoder_layers=6):
super().__init__()
# create ResNet-50 backbone
self.backbone = resnet50()
del self.backbone.fc
# create conversion layer
self.conv = nn.Conv2d(2048, hidden_dim, 1)
# create a default PyTorch transformer
self.transformer = nn.Transformer(
hidden_dim, nheads, num_encoder_layers, num_decoder_layers)
# prediction heads, one extra class for predicting non-empty slots
# note that in baseline DETR linear_bbox layer is 3-layer MLP
self.linear_class = nn.Linear(hidden_dim, num_classes + 1)
self.linear_bbox = nn.Linear(hidden_dim, 4)
# output positional encodings (object queries)
self.query_pos = nn.Parameter(torch.rand(100, hidden_dim))
# spatial positional encodings
# note that in baseline DETR we use sine positional encodings
self.row_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
self.col_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
def forward(self, inputs):
# propagate inputs through ResNet-50 up to avg-pool layer
x = self.backbone.conv1(inputs)
x = self.backbone.bn1(x)
x = self.backbone.relu(x)
x = self.backbone.maxpool(x)
x = self.backbone.layer1(x)
x = self.backbone.layer2(x)
x = self.backbone.layer3(x)
x = self.backbone.layer4(x)
# convert from 2048 to 256 feature planes for the transformer
h = self.conv(x)
# construct positional encodings
H, W = h.shape[-2:]
pos = torch.cat([
self.col_embed[:W].unsqueeze(0).repeat(H, 1, 1),
self.row_embed[:H].unsqueeze(1).repeat(1, W, 1),
], dim=-1).flatten(0, 1).unsqueeze(1)
# propagate through the transformer
h = self.transformer(pos + 0.1 * h.flatten(2).permute(2, 0, 1),
self.query_pos.unsqueeze(1)).transpose(0, 1)
# finally project transformer outputs to class labels and bounding boxes
return {'pred_logits': self.linear_class(h),
'pred_boxes': self.linear_bbox(h).sigmoid()}
# for output bounding box post-processing
def box_cxcywh_to_xyxy(x):
x_c, y_c, w, h = x.unbind(1)
b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
(x_c + 0.5 * w), (y_c + 0.5 * h)]
return torch.stack(b, dim=1)
def rescale_bboxes(out_bbox, size):
img_w, img_h = size
b = box_cxcywh_to_xyxy(out_bbox)
b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32)
return b
def detect(im, model, transform):
# mean-std normalize the input image (batch-size: 1)
img = transform(im).unsqueeze(0)
# demo model only support by default images with aspect ratio between 0.5 and 2
# if you want to use images with an aspect ratio outside this range
# rescale your image so that the maximum size is at most 1333 for best results
assert img.shape[-2] <= 1600 and img.shape[-1] <= 1600, 'demo model only supports images up to 1600 pixels on each side'
# propagate through the model
outputs = model(img)
# keep only predictions with 0.7+ confidence
probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
keep = probas.max(-1).values > 0.0
# convert boxes from [0; 1] to image scales
bboxes_scaled = rescale_bboxes(outputs['pred_boxes'][0, keep], im.size)
return probas[keep], bboxes_scaled
def plot_results(pil_img, prob, boxes, classes):
plt.figure(figsize=(16,10))
plt.imshow(pil_img)
ax = plt.gca()
for p, (xmin, ymin, xmax, ymax), c in zip(prob, boxes.tolist(), COLORS * 100):
ax.add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
fill=False, color=c, linewidth=3))
cl = p.argmax()
text = f'{classes[cl]}: {p[cl]:0.2f}'
ax.text(xmin, ymin, text, fontsize=15,
bbox=dict(facecolor='yellow', alpha=0.5))
plt.axis('off')
plt.show()
if __name__ == '__main__':
# standard PyTorch mean-std input image normalization
transform = T.Compose([
T.Resize(800),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
with open(args.labels) as f:
classes = f.read().splitlines()
for im in glob.glob(os.path.join(os.path.abspath(args.image_dir), '*jpg'), recursive=True):
detr = DETRdemo(num_classes=len(classes))
state_dict = torch.load(args.checkpoint, map_location='cpu')
state_dict["model"] = {k.replace(".0.body", ""): v for k,v in state_dict["model"].items()}
detr.load_state_dict(state_dict['model'], strict=True)
detr.eval()
scores, boxes = None, None
im = Image.open(im).convert('RGB')
scores, boxes = detect(im, detr, transform)
plot_results(im, scores, boxes, classes)
# url
#state_dict = torch.hub.load_state_dict_from_url(
# url='https://dl.fbaipublicfiles.com/detr/detr_demo-da2a99e9.pth',
# map_location='cpu', check_hash=True
# )
#detr.load_state_dict(state_dict)
3. what you observed (including full logs):
Traceback (most recent call last):
File "inference.py", line 343, in <module>
detr.load_state_dict(state_dict['model'], strict=True)
File "/home/tensorflow/venvs/detr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DETRdemo:
Missing key(s) in state_dict: "query_pos", "row_embed", "col_embed", "conv.weight", "conv.bias", "transformer.encoder.norm.weight", "transformer.encoder.norm.bias", "linear_class.weight", "linear_class.bias", "linear_bbox.weight", "linear_bbox.bias".
Unexpected key(s) in state_dict: "class_embed.weight", "class_embed.bias", "bbox_embed.layers.0.weight", "bbox_embed.layers.0.bias", "bbox_embed.layers.1.weight", "bbox_embed.layers.1.bias", "bbox_embed.layers.2.weight", "bbox_embed.layers.2.bias", "query_embed.weight", "input_proj.weight", "input_proj.bias".
Before that, many keys were different, so I tried to compare the keys names between my model and the model provided in the notebook, since the latter worked just fine, so I renamed some by removing a weird string ‘0.body’ from them to make them look similar (got help from here), and after this step I got that error message regarding the remaining keys.
I am trying right now similarly to solve it and rename some of those remaining keys by trying to find a relation between them, (e.g their shape), something like this:
input_proj.bias
rename to conv.bias
input_proj.weight
rename to conv.weight
query_embed.weight
rename to query_pos
class_embed.weight
rename to linear_class.weight
class_embed.bias
rename to linear_class.bias
bbox_embed.layers.2.weight
rename to linear_bbox.weight
bbox_embed.layers.2.weight.bias
rename to linear_bbox.bias
but still, some other keys (printed them below) I did not know honestly how to rename them. In general, am not sure that renaming them is the solution!
## my model unique keys
my_state_dict >>> bbox_embed.layers.0.weight shape: torch.Size([256, 256])
my_state_dict >>> bbox_embed.layers.0.bias shape: torch.Size([256])
my_state_dict >>> bbox_embed.layers.1.weight shape: torch.Size([256, 256])
my_state_dict >>> bbox_embed.layers.1.bias shape: torch.Size([256])
## the notebook model unique keys
notebook_state_dict >>> transformer.encoder.norm.weight shape: torch.Size([256])
notebook_state_dict >>> transformer.encoder.norm.bias shape: torch.Size([256])
notebook_state_dict >>> row_embed shape: torch.Size([50, 128])
notebook_state_dict >>> col_embed shape: torch.Size([50, 128])
And passing strict=False
when load_state_dict let the code run without errors but the detection boxes were weird, like running inference on same image multiply times gives me only one box with different coordinates and different score each time, I am not sure what’s causing this! (example attached)
Expected behavior:
I am expecting to run the inference code without keys errors and/or detect objects correctly on images.
Now I am really out of ideas, so I will be very glad for any help please (: Thank you in advance!
Issue Analytics
- State:
- Created 2 years ago
- Comments:6
Top GitHub Comments
Hi @ducvuuit, I am not sure what could be the reason. What is the plot on the left? is it the training loss? if yes, maybe it is an overfitting problem? As far as I remember I got similar mAP first when I started training, but then when I added those arguments “–position_embedding learned --pre_norm” (as I mentioned in this issue), I got the above plot I posted.
Anyway I just started a new training on a bigger dataset (~100k images) with 22 classes and without resuming from a checkpoint, I will report the result here if it worked.
Sure (: my apology for not pointing out that I made some changes to the code, especially in models/detr.py and not just adding those args, I just remembered that after you asked for the code, I will tell you all changes I made here since I can’t share stuff from our company server.
parser.add_argument('--num_classes', type=int, default=91, help="Number of classes in your dataset. Overridden by coco and coco_panoptic datasets")
parser.add_argument('--data_path', type=str)
class CocoDetection
-->class CustomDetection
class ConvertCocoPolysToMask
-->class ConvertCustomPolysToMask
def make_coco_transforms
-->def make_custom_transforms
etc…self.bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3)
-->self.bbox_embed = nn.Linear(hidden_dim, 4)
self.query_embed = nn.Embedding(num_queries, hidden_dim)
-->self.query_embed = nn.Parameter(torch.rand(100, hidden_dim))
hs = self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0]
–>hs = self.transformer(self.input_proj(src), mask, self.query_embed, pos[-1])[0]
losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]
-->losses['class_error'] = 100 - accuracy(src_logits[idx][..., :-1], target_classes_o)[0]
# according to https://github.com/facebookresearch/detr/issues/41num_classes = 20 if args.dataset_file != 'coco' else 91
-->num_classes = args.num_classes
The command I run:
CUDA_VISIBLE_DEVICES=1 python main.py --data_path custom_dataset/ --output_dir custom_dataset/output/ --dataset_file custom --num_classes 22 --position_embedding learned --pre_norm
That’s all. For the training I ran yesterday it is still at epoch 3, so maybe I will report the result after one month lol. Btw I am using NVIDIA GeForce RTX 2080 Ti 11018MiB.
For inference I have a script mostly based on the one provided in the notebook that I can share it as well, but I am not sure if it works correctly, I remember that day when I closed this issue that I was still getting same label name for all objects, anyway I need to check it again.
log.txt