question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UNet Training Error: Size of Tensors Mismatched

See original GitHub issue

I’m currently experiencing mismatch between my input tensors while trying to train UNet with BraTS2018 data.

I’m working off of the spleen example, which has been very helpful, but I’ve been unable to complete training. I’ve referred to issues #418 and #323, but am still stuck.

My code is as follows:

Data set and Transforms

text_t1 = open(r'C:\Users\jilli\Documents\MF-MRI\BraTS 2018 Training Data\Training\filename_t1.txt', 'r') 
train_images = text_t1.read().split('\n')

text_segs = open(r'C:\Users\jilli\Documents\MF-MRI\BraTS 2018 Training Data\Training\filename_seg.txt', 'r') 
train_labels = text_segs.read().split('\n')

data_dicts = [{'image': image_name, 'label': label_name}
              for image_name, label_name in zip(train_images, train_labels)]
train_files, val_files = data_dicts[:-9], data_dicts[-9:]

train_transforms = Compose([
    LoadNiftid(keys=['image', 'label']),
    AddChanneld(keys=['image', 'label']),
    Spacingd(keys=['image', 'label'], pixdim=(1.5, 1.5, 2.), mode=('bilinear', 'nearest')),
    Orientationd(keys=['image', 'label'], axcodes='RAS'),
    ScaleIntensityRanged(keys=['label'], a_min=0, a_max=4, b_min=0.0, b_max=1.0, clip=True),
    #CropForegroundd(keys=['image', 'label'], source_key='image'),
    ToTensord(keys=['image', 'label'])
])
val_transforms = Compose([
    LoadNiftid(keys=['image', 'label']),
    AddChanneld(keys=['image', 'label']),
    Spacingd(keys=['image', 'label'], pixdim=(1.5, 1.5, 2.), mode=('bilinear', 'nearest')),
    Orientationd(keys=['image', 'label'], axcodes='RAS'),
    ScaleIntensityRanged(keys=['label'], a_min=0, a_max=4, b_min=0.0, b_max=1.0, clip=True),
    #CropForegroundd(keys=['image', 'label'], source_key='image'),
    ToTensord(keys=['image', 'label'])
])

Cache Dataset

train_ds = monai.data.CacheDataset(
    data=train_files, transform=train_transforms, cache_rate=1.0, num_workers=0
)
# train_ds = monai.data.Dataset(data=train_files, transform=train_transforms)

# use batch_size=2 to load images and use RandCropByPosNegLabeld
# to generate 2 x 4 images for network training
train_loader = monai.data.DataLoader(train_ds, batch_size=2, shuffle=True, num_workers=0, multiprocessing_context = None)

val_ds = monai.data.CacheDataset(
    data=val_files, transform=val_transforms, cache_rate=1.0, num_workers=0
)
# val_ds = monai.data.Dataset(data=val_files, transform=val_transforms)
val_loader = monai.data.DataLoader(val_ds, batch_size=1, num_workers=0, multiprocessing_context = None)

Training

device = torch.device('cpu')
model = monai.networks.nets.UNet(dimensions=3, in_channels=1, out_channels=2, channels=(16, 32, 64, 128, 256),
                                 strides=(2, 2, 2, 2), num_res_units=2, norm=Norm.BATCH).to(device)
loss_function = monai.losses.DiceLoss(to_onehot_y=True, softmax=True)
optimizer = torch.optim.Adam(model.parameters(), 1e-4)

val_interval = 2
best_metric = -1
best_metric_epoch = -1
epoch_loss_values = list()
metric_values = list()
for epoch in range(600):
    print('-' * 10)
    print('Epoch {}/{}'.format(epoch + 1, 600))
    model.train()
    epoch_loss = 0
    step = 0
    for batch_data in train_loader:
        step += 1
        inputs, labels = batch_data['image'].to(device), batch_data['label'].to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = loss_function(outputs, labels)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        print('{}/{}, train_loss: {:.4f}'.format(step, len(train_ds) // train_loader.batch_size, loss.item()))
    epoch_loss /= step
    epoch_loss_values.append(epoch_loss)
    print('epoch {} average loss: {:.4f}'.format(epoch + 1, epoch_loss))

    if (epoch + 1) % val_interval == 0:
        model.eval()
        with torch.no_grad():
            metric_sum = 0.
            metric_count = 0
            for val_data in val_loader:
                val_inputs, val_labels = val_data['image'].to(device), val_data['label'].to(device)
                roi_size = (160, 160, 160)
                sw_batch_size = 4
                val_outputs = sliding_window_inference(val_inputs, roi_size, sw_batch_size, model)
                value = compute_meandice(y_pred=val_outputs, y=val_labels, include_background=False,
                                         to_onehot_y=True, mutually_exclusive=True)
                metric_count += len(value)
                metric_sum += value.sum().item()
            metric = metric_sum / metric_count
            metric_values.append(metric)
            if metric > best_metric:
                best_metric = metric
                best_metric_epoch = epoch + 1
                torch.save(model.state_dict(), 'best_metric_model.pth')
                print('saved new best metric model')
            print('current epoch {} current mean dice: {:.4f} best mean dice: {:.4f} at epoch {}'.format(
                epoch + 1, metric, best_metric, best_metric_epoch))

I get the following error:

  File "<ipython-input-20-6831040c0cf9>", line 41, in <module>
    outputs = model(inputs)
  File "C:\Users\jilli\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\jilli\AppData\Roaming\Python\Python37\site-packages\monai\networks\nets\unet.py", line 128, in forward
    x = self.model(x)
  File "C:\Users\jilli\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\jilli\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
    input = module(input)
  File "C:\Users\jilli\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\jilli\AppData\Roaming\Python\Python37\site-packages\monai\networks\layers\simplelayers.py", line 33, in forward
    return torch.cat([x, self.submodule(x)], self.cat_dim)

RuntimeError: Sizes of tensors must match except in dimension 1. Got 39 and 40 in dimension 4

Please note image.shape = label.shape = 160, 160, 78

My current set up is on Windows 10 MONAI version: 0.2.0rc1+19.ge8c26a2 Python version: 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)] Numpy version: 1.19.0 Pytorch version: 1.5.0

Any help would be greatly appreciated! Thanks so much.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:24 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
jillianleecommented, Jul 3, 2020

This worked! Thank you. I did also have to change the output_channel to 1 which I believe makes snse. The image input is a single channeled image and the label is a multi-class single channeled label, so a single output channel with a loss function with to_onehot_y = True should correctly run training right?

Thanks again, both of you! I greatly appreciate all your help.

1reaction
wylicommented, Jul 3, 2020

Sorry I was not verifying it properly. The size should be divisible by 16, so 160,160,80 would work. If you want to use image size 160,160,72 or 168,168,80, then change the network to have channels=(16, 32, 64, 128), strids=(2, 2, 2) from channels=(16, 32, 64, 128, 256), strides=(2, 2, 2, 2)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unet training Error: The size of tensor a (16) must match the ...
I'm trying to train a Unit model on LandCoverNet dataset, which is a satellite imagery dataset that contains input images and corresponding land ......
Read more >
U-net training Error: The size of tensor a (16) must match the ...
At first, the target shape was [16, 6, 224, 224] but I had an error and found this thread that it should be...
Read more >
UNet: Size error for a custom dataset - fastai
Nice, this error I know ! There is a problem between the size of input's dimension 0 (which should be 3) and the...
Read more >
Architecture of the trained U-Net, the tensors (represented in ...
Ultrasound Imaging, Segmentation and Deep Learning | ResearchGate, ... We explored this topic by training a U-net with different training-set size and by ......
Read more >
Transfer learning and fine-tuning | TensorFlow Core
These are the first 9 images in the training dataset -- as you can see, they're all different sizes. import matplotlib.pyplot as plt...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found