question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unsuccessful Image Caption step 2 fine-tuning

See original GitHub issue

Hi, when running the train_caption_stage2.sh, it triggers a AssertError like:

Traceback (most recent call last):
  File "../../train.py", line 527, in <module>
    cli_main()
  File "../../train.py", line 520, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/workspace/OFA/fairseq/fairseq/distributed/utils.py", line 374, in call_main
    distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
  File "/workspace/OFA/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
    main(cfg, **kwargs)
  File "../../train.py", line 189, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
  File "/opt/conda/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "../../train.py", line 300, in train
    log_output = trainer.train_step(samples)
  File "/opt/conda/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/workspace/OFA/trainer.py", line 773, in train_step
    loss, sample_size_i, logging_output = self.task.train_step(
  File "/workspace/OFA/tasks/ofa_task.py", line 319, in train_step
    loss, sample_size, logging_output = criterion(model, sample, update_num=update_num)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 881, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/workspace/OFA/criterions/scst_loss.py", line 88, in forward
    loss, score, ntokens, nsentences = self.compute_loss(model, sample, reduce=reduce)
  File "/workspace/OFA/criterions/scst_loss.py", line 239, in compute_loss
    gen_target, gen_res, gt_res = self.get_generator_out(model, sample)
  File "/workspace/OFA/criterions/scst_loss.py", line 149, in get_generator_out
    gen_out = self.task.scst_generator.generate([model], sample)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/OFA/models/sequence_generator.py", line 207, in generate
    return self._generate(models, sample, **kwargs)
  File "/workspace/OFA/models/sequence_generator.py", line 480, in _generate
    assert step < max_len, f"{step} < {max_len}"
AssertionError: 16 < 16

How can I fix this problem? Thx!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8

github_iconTop GitHub Comments

1reaction
logicwongcommented, Mar 6, 2022

I’m still confused because 24GB memory is not enough to use batch_size=4 for stage2 finetuning. But in your file, you do use batch_size=4. Can you check if this file is actually used for stage2 finetuning?

298E99EC-6F58-4C5B-8462-9081B3DB492C
0reactions
dannyxiaocncommented, Mar 6, 2022

You are brilliant! The problem is the stored checkpoint from stage 1 has a mismatched name ( Specifically, no checkpoint is loaded in. And of course, the batch_size is too large for me, I switch it from 4 to 1 ). That’s the reason why it has that Error. It even didn’t reach the data part! After I change the name to be matched, the stage2 fine-tuning is running! Thank you so much!

Read more comments on GitHub >

github_iconTop Results From Across the Web

LRP-Inference Fine-Tuning for Image Captioning Models - arXiv
Abstract—This paper analyzes the predictions of image cap- tioning models with attention mechanisms beyond visualizing the attention itself.
Read more >
Explain and improve: LRP-inference fine-tuning for image ...
We propose an LRP-inference fine-tuning strategy that reduces object ... In the rest of this paper, Section 2 introduces recent image captioning models, ......
Read more >
Contcap - A scalable framework for continual image captioning
Code for: Contcap - A scalable framework for continual image captioning - GitHub ... Step 2: Train model 2to21 to get model for...
Read more >
Fine-Tune a Model - Amazon SageMaker
Fine-tuning trains a pretrained model on a new dataset without training from scratch. This process, also known as transfer learning, can produce accurate ......
Read more >
Image captioning with visual attention | TensorFlow Core
The steps are: Load the images (and ignore images that fail to load). Replicate images to match the number of captions. Shuffle and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found