question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using custom input image shape

See original GitHub issue

Can we use a custom input image shape while training? I am looking forward to set an input shape of (512, 512, 3) but anything else that (32, 32, 3) throws a mismatch error. Can you explain how to determine the encoder and decoder network parameters? Thanks!

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ascillitoecommented, Aug 2, 2022

@pranjal-joshi-cc has @mauicv answered your question above? If so we shall close this issue 🙂

1reaction
mauicvcommented, Jul 6, 2022

Hey @pranjal-joshi-cc,

I’ve understood the encoder part that through the strides parameter, we control the dimensionality reduction and with encoder_net.summary() we can see the size of last convolution operation i.e. N x N x Filters. However, is it necessary to always map the encoder into 32 x 32 for alibi-detect to work or the choice of autoencoder is purely arbitrary?

I’m not completely sure what you mean here? The choice of the autoencoder is arbitrary except that:

  1. The architecture needs to be sufficient to model the data well. What I mean by this is when it’s trained in the detector fit method it needs to reduce the reconstruction error well. This might not be possible if you don’t have enough capacity in the network. As an example, if you don’t choose a big enough latent dimension you might have difficulty. I don’t think this should be an issue for the models defined above though.
  2. The VAE needs to give the same shape as output as it takes as input. For the purposes of the VAEOutlier this really only applies to the decoder. It needs to ensure that the decoder maps from the latent space of size latent_dim to the same shape as the original input image, so in your case (512, 512, 3).

In terms of the output shape of the encoder, it doesn’t really matter as long as the capacity is sufficient, basically that you don’t reduce the dimensionality too much. For the architecture I provided above for instance we have:

encoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=IMAGE_SHAPE),
      Conv2D(32, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(256, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(516, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(1024, 4, strides=2, padding='same', activation=tf.nn.relu),
  ])

and the summary is:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 256, 256, 32)      1568      
                                                                 
 conv2d_1 (Conv2D)           (None, 128, 128, 64)      32832     
                                                                 
 conv2d_2 (Conv2D)           (None, 64, 64, 128)       131200    
                                                                 
 conv2d_3 (Conv2D)           (None, 32, 32, 256)       524544    
                                                                 
 conv2d_4 (Conv2D)           (None, 16, 16, 516)       2114052   
                                                                 
 conv2d_5 (Conv2D)           (None, 8, 8, 1024)        8455168   
                                                                 
=================================================================
Total params: 11,259,364
Trainable params: 11,259,364
Non-trainable params: 0
_________________________________________________________________

So the output shape of the encoder_net is (8, 8, 1024). Note that the VAEOutlier adds some Dense layers to the encoder_net to transform the (8, 8, 1024) output to the latent space of dimension 1024 where you’ve chosen latent_dim=1024.

Also, Please explain how to calculate and reshape dense layers in decoder net as its quite confusing for me. How to determine the number of Dense units i.e. 8*8*1024 and how to determine the reshaping in the next layer?

The decoder_net maps from the latent space of dimension 1024 (in our case) to the output shape (512, 512, 3). So it is going to take a vector of length latent_dim. We want to transform this to a shape that can then easily be scaled up to (512, 512, 3). You can do this a number of ways but it’s easiest if we set up the Conv2dTranspose operation to double the size of the height and width at each layer of the network. The reason we choose 8*8*1024 is just that this can then be reshaped into (8, 8, 1024). We can then upscale this to obtain the output image by applying each of the transpose layers. For instance, given the architecture I suggested above:

latent_dim = 1024

decoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(latent_dim,)),
      Dense(8*8*1024),
      Reshape(target_shape=(8, 8, 1024)),
      Conv2DTranspose(1024, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(516, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(128, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(32, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(3, 1, strides=1, padding='same', activation='sigmoid')
  ])

The latent vector of shape (1, 1024) is mapped to a vector of shape (1, 8*8*1024) which is then reshaped to (1, 8, 8, 1024) and then upscaled by each of the transpose layers: (1, 8, 8, 1024) -> (1, 16, 16, 1024) -> (1, 32, 32, 516) -> (1, 64, 64, 256) -> (1, 128, 128, 128) -> (1, 256, 256, 64) -> (1, 512, 512, 32) -> (1, 512, 512, 3). So (8*8*1024) is really chosen as a convenience in order to reshape the tensor. Typically we choose image height and width sizes to be powers of 2 just becuase it makes this operation of scaling up and down simpler but in general this doesn’t have to be the case. The formula for the output size of a transpose convolution is documented here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Change input shape dimensions for fine-tuning with Keras
In this tutorial, you will learn how to change the input shape tensor dimensions for fine-tuning using Keras. After going through this guide ......
Read more >
Is it possible to use a custom input shape with efficient det?
Hi,. I want to use TFLite Model Maker to train custom object detection models. When I try to change the hparam image_size.
Read more >
How do custom input_shape for Inception V3 in Keras work?
This is possible because the model is fully convolutional. Convolutions don't care about the image size, they're "sliding filters".
Read more >
Keras -- Transfer learning -- changing Input tensor shape
You can do this by creating a new VGG16 model instance with the new input shape new_shape and copying over all the layer...
Read more >
Changing input size of pre-trained models in Keras
The function first changes the input shape parameters of the network. At this point the internals of the model have not been registered....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found