question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training on Dataset with different image sizes

See original GitHub issue

Hello.

I’m trying to use your implementation to train on my own dataset. I have multiple images with multiple sizes, annotated with the Pascal VOC format.

The thing is I keep getting the following error:

…/ssd_batch_generator.py", line 1182, in generate batch_X = np.array(batch_X) ValueError: could not broadcast input array from shape (224,224,4) into shape (224,224)

I’m using the example of SSD 7 to get started, with this generator configuration:

train_generator = train_dataset.generate(batch_size=batch_size,
                                         shuffle=True,
                                         train=True,
                                         ssd_box_encoder=ssd_box_encoder,
                                         convert_to_3_channels=True,
                                         equalize=False,
                                         brightness=(0.5, 2, 0.5), # Randomly change brightness between 0.5 and 2 with probability 0.5
                                         flip=0.5, # Randomly flip horizontally with probability 0.5
                                         translate=((5, 70), (3, 50), 0.5), # Randomly translate by 5-70 pixels horizontally and 3-50 pixels vertically with probability 0.5
                                         scale=(0.7, 1.4, 0.5), # Randomly scale between 0.7 and 1.4 with probability 0.5
                                         max_crop_and_resize=False,
                                         random_pad_and_resize=False,
                                         random_crop=False,
                                         crop=False,
                                         resize=(224, 224),
                                         gray=False,
                                         limit_boxes=True,
                                         include_thresh=0.4)

val_generator = val_dataset.generate(batch_size=batch_size,
                                     shuffle=True,
                                     train=True,
                                     ssd_box_encoder=ssd_box_encoder,
                                     convert_to_3_channels=True,
                                     equalize=False,
                                     brightness=False,
                                     flip=False,
                                     translate=False,
                                     scale=False,
                                     max_crop_and_resize=False,
                                     random_pad_and_resize=False,
                                     random_crop=False,
                                     crop=False,
                                     resize=(224, 224),
                                     gray=False,
                                     limit_boxes=True,
                                     include_thresh=0.4)

Is there any configuration I should do to avoid this error, my guess it’s possibly because there are multiple sized images on my dataset.

Any help is appreciated thanks! And also thanks for this great repository.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
pierluigiferraricommented, Mar 21, 2018

Glad to hear it. Yeah, the generator cannot yet deal with 4-channel images unless all images have 4 channels, I guess I’ll fix that soon. What data format were your images in? What’s the fourth channel? Alpha? Depth?

0reactions
pierluigiferraricommented, Mar 21, 2018

Yeah, if an image has a fourth channel, that channel will be a special channel (alpha, depth, etc.), so it isn’t possible to artificially create that fourth channel for a 3-channel image in a meaningful way. The best solution for a dataset in which only some images have four channels is therefore to just throw away the fourth channel, which is a relatively inexpensive operation in Numpy compared to what your function above does.

Thanks for the feedback, I’ll close this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Handle Images of Different Sizes in a Convolutional ...
Some of these transformations are: Resize - Resize the variable-sized images to the same size image. We can easily implement this using tf....
Read more >
How to prepare the varied size input in CNN prediction
One way is to pad the images while training. That is to say, while ...
Read more >
TRAINING CONVNETS WITH MIXED IMAGE SIZES FOR ...
In this work, we describe and evaluate a novel mixed-size training regime that mixes several image sizes at training time. We demonstrate that...
Read more >
Training a CNN with different image sizes ? : r/deeplearning
Hi, I have a dataset that contains three types of images with different sizes 512x512, 312x312, 442x442. I can resize them to a...
Read more >
How to Pick the Optimal Image Size for Training Convolution ...
We all know that choosing the right size depends on the dataset, but the question is how to do it? Visualize the image...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found