UNet Training Error: Size of Tensors Mismatched
See original GitHub issueI’m currently experiencing mismatch between my input tensors while trying to train UNet with BraTS2018 data.
I’m working off of the spleen example, which has been very helpful, but I’ve been unable to complete training. I’ve referred to issues #418 and #323, but am still stuck.
My code is as follows:
Data set and Transforms
text_t1 = open(r'C:\Users\jilli\Documents\MF-MRI\BraTS 2018 Training Data\Training\filename_t1.txt', 'r')
train_images = text_t1.read().split('\n')
text_segs = open(r'C:\Users\jilli\Documents\MF-MRI\BraTS 2018 Training Data\Training\filename_seg.txt', 'r')
train_labels = text_segs.read().split('\n')
data_dicts = [{'image': image_name, 'label': label_name}
for image_name, label_name in zip(train_images, train_labels)]
train_files, val_files = data_dicts[:-9], data_dicts[-9:]
train_transforms = Compose([
LoadNiftid(keys=['image', 'label']),
AddChanneld(keys=['image', 'label']),
Spacingd(keys=['image', 'label'], pixdim=(1.5, 1.5, 2.), mode=('bilinear', 'nearest')),
Orientationd(keys=['image', 'label'], axcodes='RAS'),
ScaleIntensityRanged(keys=['label'], a_min=0, a_max=4, b_min=0.0, b_max=1.0, clip=True),
#CropForegroundd(keys=['image', 'label'], source_key='image'),
ToTensord(keys=['image', 'label'])
])
val_transforms = Compose([
LoadNiftid(keys=['image', 'label']),
AddChanneld(keys=['image', 'label']),
Spacingd(keys=['image', 'label'], pixdim=(1.5, 1.5, 2.), mode=('bilinear', 'nearest')),
Orientationd(keys=['image', 'label'], axcodes='RAS'),
ScaleIntensityRanged(keys=['label'], a_min=0, a_max=4, b_min=0.0, b_max=1.0, clip=True),
#CropForegroundd(keys=['image', 'label'], source_key='image'),
ToTensord(keys=['image', 'label'])
])
Cache Dataset
train_ds = monai.data.CacheDataset(
data=train_files, transform=train_transforms, cache_rate=1.0, num_workers=0
)
# train_ds = monai.data.Dataset(data=train_files, transform=train_transforms)
# use batch_size=2 to load images and use RandCropByPosNegLabeld
# to generate 2 x 4 images for network training
train_loader = monai.data.DataLoader(train_ds, batch_size=2, shuffle=True, num_workers=0, multiprocessing_context = None)
val_ds = monai.data.CacheDataset(
data=val_files, transform=val_transforms, cache_rate=1.0, num_workers=0
)
# val_ds = monai.data.Dataset(data=val_files, transform=val_transforms)
val_loader = monai.data.DataLoader(val_ds, batch_size=1, num_workers=0, multiprocessing_context = None)
Training
device = torch.device('cpu')
model = monai.networks.nets.UNet(dimensions=3, in_channels=1, out_channels=2, channels=(16, 32, 64, 128, 256),
strides=(2, 2, 2, 2), num_res_units=2, norm=Norm.BATCH).to(device)
loss_function = monai.losses.DiceLoss(to_onehot_y=True, softmax=True)
optimizer = torch.optim.Adam(model.parameters(), 1e-4)
val_interval = 2
best_metric = -1
best_metric_epoch = -1
epoch_loss_values = list()
metric_values = list()
for epoch in range(600):
print('-' * 10)
print('Epoch {}/{}'.format(epoch + 1, 600))
model.train()
epoch_loss = 0
step = 0
for batch_data in train_loader:
step += 1
inputs, labels = batch_data['image'].to(device), batch_data['label'].to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = loss_function(outputs, labels)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
print('{}/{}, train_loss: {:.4f}'.format(step, len(train_ds) // train_loader.batch_size, loss.item()))
epoch_loss /= step
epoch_loss_values.append(epoch_loss)
print('epoch {} average loss: {:.4f}'.format(epoch + 1, epoch_loss))
if (epoch + 1) % val_interval == 0:
model.eval()
with torch.no_grad():
metric_sum = 0.
metric_count = 0
for val_data in val_loader:
val_inputs, val_labels = val_data['image'].to(device), val_data['label'].to(device)
roi_size = (160, 160, 160)
sw_batch_size = 4
val_outputs = sliding_window_inference(val_inputs, roi_size, sw_batch_size, model)
value = compute_meandice(y_pred=val_outputs, y=val_labels, include_background=False,
to_onehot_y=True, mutually_exclusive=True)
metric_count += len(value)
metric_sum += value.sum().item()
metric = metric_sum / metric_count
metric_values.append(metric)
if metric > best_metric:
best_metric = metric
best_metric_epoch = epoch + 1
torch.save(model.state_dict(), 'best_metric_model.pth')
print('saved new best metric model')
print('current epoch {} current mean dice: {:.4f} best mean dice: {:.4f} at epoch {}'.format(
epoch + 1, metric, best_metric, best_metric_epoch))
I get the following error:
File "<ipython-input-20-6831040c0cf9>", line 41, in <module>
outputs = model(inputs)
File "C:\Users\jilli\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\jilli\AppData\Roaming\Python\Python37\site-packages\monai\networks\nets\unet.py", line 128, in forward
x = self.model(x)
File "C:\Users\jilli\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\jilli\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
input = module(input)
File "C:\Users\jilli\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\jilli\AppData\Roaming\Python\Python37\site-packages\monai\networks\layers\simplelayers.py", line 33, in forward
return torch.cat([x, self.submodule(x)], self.cat_dim)
RuntimeError: Sizes of tensors must match except in dimension 1. Got 39 and 40 in dimension 4
Please note image.shape = label.shape = 160, 160, 78
My current set up is on Windows 10 MONAI version: 0.2.0rc1+19.ge8c26a2 Python version: 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)] Numpy version: 1.19.0 Pytorch version: 1.5.0
Any help would be greatly appreciated! Thanks so much.
Issue Analytics
- State:
- Created 3 years ago
- Comments:24 (11 by maintainers)
Top Results From Across the Web
Unet training Error: The size of tensor a (16) must match the ...
I'm trying to train a Unit model on LandCoverNet dataset, which is a satellite imagery dataset that contains input images and corresponding land ......
Read more >U-net training Error: The size of tensor a (16) must match the ...
At first, the target shape was [16, 6, 224, 224] but I had an error and found this thread that it should be...
Read more >UNet: Size error for a custom dataset - fastai
Nice, this error I know ! There is a problem between the size of input's dimension 0 (which should be 3) and the...
Read more >Architecture of the trained U-Net, the tensors (represented in ...
Ultrasound Imaging, Segmentation and Deep Learning | ResearchGate, ... We explored this topic by training a U-net with different training-set size and by ......
Read more >Transfer learning and fine-tuning | TensorFlow Core
These are the first 9 images in the training dataset -- as you can see, they're all different sizes. import matplotlib.pyplot as plt...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This worked! Thank you. I did also have to change the output_channel to 1 which I believe makes snse. The image input is a single channeled image and the label is a multi-class single channeled label, so a single output channel with a loss function with to_onehot_y = True should correctly run training right?
Thanks again, both of you! I greatly appreciate all your help.
Sorry I was not verifying it properly. The size should be divisible by 16, so
160,160,80
would work. If you want to use image size160,160,72
or168,168,80
, then change the network to havechannels=(16, 32, 64, 128), strids=(2, 2, 2)
fromchannels=(16, 32, 64, 128, 256), strides=(2, 2, 2, 2)