Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training on Dataset with different image sizes

See original GitHub issue

Hello.

I’m trying to use your implementation to train on my own dataset. I have multiple images with multiple sizes, annotated with the Pascal VOC format.

The thing is I keep getting the following error:

…/ssd_batch_generator.py", line 1182, in generate batch_X = np.array(batch_X) ValueError: could not broadcast input array from shape (224,224,4) into shape (224,224)

I’m using the example of SSD 7 to get started, with this generator configuration:

train_generator = train_dataset.generate(batch_size=batch_size,
                                         shuffle=True,
                                         train=True,
                                         ssd_box_encoder=ssd_box_encoder,
                                         convert_to_3_channels=True,
                                         equalize=False,
                                         brightness=(0.5, 2, 0.5), # Randomly change brightness between 0.5 and 2 with probability 0.5
                                         flip=0.5, # Randomly flip horizontally with probability 0.5
                                         translate=((5, 70), (3, 50), 0.5), # Randomly translate by 5-70 pixels horizontally and 3-50 pixels vertically with probability 0.5
                                         scale=(0.7, 1.4, 0.5), # Randomly scale between 0.7 and 1.4 with probability 0.5
                                         max_crop_and_resize=False,
                                         random_pad_and_resize=False,
                                         random_crop=False,
                                         crop=False,
                                         resize=(224, 224),
                                         gray=False,
                                         limit_boxes=True,
                                         include_thresh=0.4)

val_generator = val_dataset.generate(batch_size=batch_size,
                                     shuffle=True,
                                     train=True,
                                     ssd_box_encoder=ssd_box_encoder,
                                     convert_to_3_channels=True,
                                     equalize=False,
                                     brightness=False,
                                     flip=False,
                                     translate=False,
                                     scale=False,
                                     max_crop_and_resize=False,
                                     random_pad_and_resize=False,
                                     random_crop=False,
                                     crop=False,
                                     resize=(224, 224),
                                     gray=False,
                                     limit_boxes=True,
                                     include_thresh=0.4)

Is there any configuration I should do to avoid this error, my guess it’s possibly because there are multiple sized images on my dataset.

Any help is appreciated thanks! And also thanks for this great repository.

Issue Analytics

State:
Created 6 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

pierluigiferraricommented, Mar 21, 2018

Glad to hear it. Yeah, the generator cannot yet deal with 4-channel images unless all images have 4 channels, I guess I’ll fix that soon. What data format were your images in? What’s the fourth channel? Alpha? Depth?

0reactions

pierluigiferraricommented, Mar 21, 2018

Yeah, if an image has a fourth channel, that channel will be a special channel (alpha, depth, etc.), so it isn’t possible to artificially create that fourth channel for a 3-channel image in a meaningful way. The best solution for a dataset in which only some images have four channels is therefore to just throw away the fourth channel, which is a relatively inexpensive operation in Numpy compared to what your function above does.

Thanks for the feedback, I’ll close this issue.