question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Image captioning] pack_packed wrong lengths

See original GitHub issue

In the Image Captioning tutorial in the DecoderRNN:

def forward(self, features, captions, lengths):
  embeddings = self.embed(captions)
  embeddings = torch.cat((features.unsqueeze(1), embeddings), 1)
  packed = pack_padded_sequence(embeddings, lengths, batch_first=True)
  hiddens, _ = self.lstm(packed)
  outputs = self.linear(hiddens[0])
  return outputs

shouldn’t the lenghts inside pack_padded be lenghts+1 to take into account the increased length due to the features added with cat?

e.g. (assuming numbers are captions, e is the embedding, f are the features and batch_size = 4) if

embeds:
e(126)  e(1214)  e(14)    e(4033)
e(126)  e(6)     e(84)    e(4033)
e(126)  e(3002)  e(4033)  e(0)
e(126)  e(3002)  e(4033)  e(0)

has lengths = [4,4,3,3] then

embeds_cat:
f_0  e(126)  e(1214)  e(14)    e(4033)
f_1  e(126)  e(6)     e(84)    e(4033)
f_2  e(126)  e(3002)  e(4033)  e(0)
f_3  e(126)  e(3002)  e(4033)  e(0) 

should have lengths = [5,5,4,4], right?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
lfraticommented, Dec 27, 2018

After some more testing I’ve decided that the lengths shouldn’t be changed since I don’t want to use the last caption as an input to the lstm (the <end> token, 4033 in the example).

0reactions
sth4kcommented, Dec 27, 2018

Using the “feature vector” you have a training process that looks like:

  • step 0: features + zeroes -> caption_0 + state_0
  • step 1: embed(caption_0) + state_0 -> caption_1 + state_1
  • step n: embed(caption_n-1) + state_n-1 -> <end>

where zeroes is the default value if no state is provided. @sth4k if you use the conv output as a “starting state” and manually provide it to the LSTM, what would you use as first input to the LSTM? A vector of constant values? I’m not an expert but I think both approaches should work in principle.

Hi LapoFrati, thanks for your reply. Somehow github eats my word <s.o.s> which is the starting of sentence token (in your case caption_0). I have modified the <s.o.s> in my original question. so you can see, features + zeros will always be used to predict caption_0 which is basically a starting token (similar to eos). Does it make sense?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Loss Calculation in Image Captioning · Issue #101 - GitHub
Each caption will be embedded with their respective length. Packing "features <start> there is a cat <end>" with length 6 is the same...
Read more >
[2005.14386] Controlling Length in Image Captioning - arXiv
Abstract: We develop and evaluate captioning models that allow control of caption length. Our models can leverage this control to generate ...
Read more >
How to Develop a Deep Learning Photo Caption Generator ...
In this tutorial, you will discover how to develop a photo captioning deep learning model from scratch. After completing this tutorial, ...
Read more >
Transformer Image captioning model produces just padding ...
I am trying to produce a model that will produce a caption for an image using resnet as the encoder, transformer as the...
Read more >
Image Captioning with Keras. Table of Contents:
It must be noted that, one image+caption is not a single data point but are multiple data points depending on the length of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found