Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Finetuning on new dataset / Modify input images on the fly

See original GitHub issue

How can I modify images on the fly? Say I would like to set a certain area of the input images region to 0? Where in your code would I need to do the surgery for that?
Rather in the ImageReader function where the image is loaded?
Or rather in the network graph itself, say by adding a layer after the data layer in DeepLabResNetModel that multiplies elementwise with some mask?

Sorry for bothering you with this stupid question. I’m new to tensorflow. Also sorry for asking usage-question, but since your code differs quite a lot from the tensorflow tutorial code I don’t really know where else to turn for that question…Thanks a lot for providing the deeplab-resnet model for tensorflow!

Issue Analytics

State:
Created 7 years ago
Comments:24 (10 by maintainers)

Top GitHub Comments

3reactions

mgarbadecommented, Apr 4, 2017

You are right. In the original code with the multiscale fusion, they have a batch_size of 1. In my version I had the multiscale part removed so I could have a batch_size of 3. Sorry for the confusion.

I’m not so sure about the indvidual losses for the different branches. Do they simply add them to the gradients during backpropagation or how do they combine them?

Good idea with the update_scale / param_scale! I’ll check that. By the way: I updated my preprocessing to random cropping and 0-padding (images) / ignore_label-padding (labels). It gave me a huge boost (+10 % accuracy) on my other datasets (CamVid and Cityscapes). So although this might not be very important for Pascal Voc12, it apparantly is for other datasets.

Here is how I preprocess images at the moment:

def read_images_from_disk(input_queue, img_type, phase, input_size = (321,321), ignore_label = 255): 
    img_contents = tf.read_file(input_queue[0])
    label_contents = tf.read_file(input_queue[1])
    
    if img_type == 1:
        img = tf.image.decode_jpeg(img_contents, channels=3) # VOC12
    else:
        img = tf.image.decode_png(img_contents, channels=3) # CamVid
        
    label = tf.image.decode_png(label_contents, channels=1)
    

    # Change RGB to BGR
    img_r, img_g, img_b = tf.split(split_dim=2, num_split=3, value=img)
    img = tf.cast(tf.concat(2, [img_b, img_g, img_r]), dtype=tf.float32)    
    
    # Mean subtraction 
    IMG_MEAN = tf.constant([104.00698793,116.66876762,122.67891434],shape=[1,1,3], dtype=tf.float32) # BGR
    IMG_MEAN = tf.reshape(IMG_MEAN,[1,1,3]) 
    img = img - IMG_MEAN
    
    
    # Optional preprocessing for training phase    
    if phase == 'train':
        img, label = preprocess_input_train(img, label, ignore_label )
    elif phase == 'valid':
        # TODO: Perform only a central crop -> size should be the same as during training
        pass
    elif phase == 'test':
        pass

    return img, label

using

def preprocess_input_train(img, label, ignore_label ):
    # Scale
    scale = tf.random_uniform([1], minval=0.5, maxval=1.5, dtype=tf.float32, seed=None)
    h_new = tf.to_int32(tf.mul(tf.to_float(tf.shape(img)[0]), scale))
    w_new = tf.to_int32(tf.mul(tf.to_float(tf.shape(img)[1]), scale))
    new_shape = tf.squeeze(tf.pack([h_new, w_new]), squeeze_dims=[1])
    img = tf.image.resize_images(img, new_shape)
    label = tf.image.resize_nearest_neighbor(tf.expand_dims(label, 0), new_shape)
    label = tf.squeeze(label, squeeze_dims=[0])
    
    # Mirror
    random_number = tf.random_uniform([2], 0, 1.0, dtype=tf.float32)
    img = image_mirroring(img, random_number)
    label = image_mirroring(label, random_number)

    # Crop and pad image
    label = tf.cast(label, dtype=tf.float32) # Needs to be subtract and later added due to 0 padding
    label = label - ignore_label
    crop_h, crop_w = [321,321]
    img_crop, label_crop = random_crop_and_pad_image_and_labels(img, label, crop_h, crop_w)
    label_crop = label_crop + ignore_label
    label_crop = tf.cast(label_crop, dtype=tf.uint8)
    
    # Set static shape so that tensorflow knows shape at compile time 
    img_crop.set_shape((crop_h, crop_w, 3))
    label_crop.set_shape((crop_h,crop_w, 1))
    return img_crop, label_crop  

def image_mirroring(image, random_number):
    distort_left_right_random = random_number[0]
    mirror = tf.less(tf.pack([1.0, distort_left_right_random, 1.0]), 0.5)
    image = tf.reverse(image, mirror)
    return image

and for cropping with padding

def random_crop_and_pad_image_and_labels(image, labels, crop_h, crop_w):
    combined = tf.concat(2, [image, labels]) 
    image_shape = tf.shape(image)
    combined_pad = tf.image.pad_to_bounding_box(
        combined, 0, 0,
        tf.maximum(crop_h, image_shape[0]),
        tf.maximum(crop_w, image_shape[1]))
    
    last_image_dim = tf.shape(image)[-1]
    last_label_dim = tf.shape(labels)[-1]
    combined_crop = tf.random_crop(combined_pad,[crop_h,crop_w,4]) # TODO: Make cropping size a variable

    return (combined_crop[:, :, :last_image_dim],
            combined_crop[:, :, last_image_dim:])

Mind that the padding for the labels has to be done with “ignore_label”. Since TF only performs a 0-padding I’m subtracting the ignore_label from label and add it again after the padding.

1reaction

mgarbadecommented, Jan 26, 2017

Thanks to your help I modified the initialization part thus:

trainable = tf.trainable_variables() 
optim = optimiser.minimize(reduced_loss, var_list=trainable)

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
init = tf.initialize_all_variables()

sess.run(init) # All tensors in the graph are initialized with their initial values -> not clear what they are? All 0?

# Restore everything but the last layer
restore_var = [v for v in trainable if not v.name.startswith('fc1_voc12')] # Only excluding ['fc1_voc12_c0', 'fc1_voc12_c1', 'fc1_voc12_c2', 'fc1_voc12_c3'] is apparantly not enough here
saver = tf.train.Saver(var_list=restore_var, max_to_keep=40) 
saver.restore(sess, 'ckpt_path/deeplab_resnet.ckpt')

Now the network is compiling and starting to train, however the network is not converging (not even on Voc12 itself without having changed the number of classes). The loss is decreasing a bit in the beginning but remains on a high level. When looking at the output pictures one can see that the network first predicts noise and then only predicts the background class for all images (which is probably the dominant class in Voc12)

Maybe the last layer is not initialized with random noise?
Any idea how to go about the random initialization?
Maybe the tf.train.AdamOptimizer is the wrong optimizer here? (I remember that in Deeplab-Caffe it was some Gradient-Descent with Momentum)
Did you ever successfully finetune deeplab-resnet on a different dataset or did you just use it for inference so far?

Thanks a lot for your help so far. Again sorry for annoying you with this problem but I’m close to giving up 😕