Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unsuccessful Image Caption step 2 fine-tuning

See original GitHub issue

Hi, when running the train_caption_stage2.sh, it triggers a AssertError like:

Traceback (most recent call last):
  File "../../train.py", line 527, in <module>
    cli_main()
  File "../../train.py", line 520, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/workspace/OFA/fairseq/fairseq/distributed/utils.py", line 374, in call_main
    distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
  File "/workspace/OFA/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
    main(cfg, **kwargs)
  File "../../train.py", line 189, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
  File "/opt/conda/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "../../train.py", line 300, in train
    log_output = trainer.train_step(samples)
  File "/opt/conda/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/workspace/OFA/trainer.py", line 773, in train_step
    loss, sample_size_i, logging_output = self.task.train_step(
  File "/workspace/OFA/tasks/ofa_task.py", line 319, in train_step
    loss, sample_size, logging_output = criterion(model, sample, update_num=update_num)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 881, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/workspace/OFA/criterions/scst_loss.py", line 88, in forward
    loss, score, ntokens, nsentences = self.compute_loss(model, sample, reduce=reduce)
  File "/workspace/OFA/criterions/scst_loss.py", line 239, in compute_loss
    gen_target, gen_res, gt_res = self.get_generator_out(model, sample)
  File "/workspace/OFA/criterions/scst_loss.py", line 149, in get_generator_out
    gen_out = self.task.scst_generator.generate([model], sample)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/OFA/models/sequence_generator.py", line 207, in generate
    return self._generate(models, sample, **kwargs)
  File "/workspace/OFA/models/sequence_generator.py", line 480, in _generate
    assert step < max_len, f"{step} < {max_len}"
AssertionError: 16 < 16

How can I fix this problem? Thx!

Issue Analytics

State:
Created 2 years ago
Comments:8

Top GitHub Comments

1reaction

logicwongcommented, Mar 6, 2022

I’m still confused because 24GB memory is not enough to use batch_size=4 for stage2 finetuning. But in your file, you do use batch_size=4. Can you check if this file is actually used for stage2 finetuning?

0reactions

dannyxiaocncommented, Mar 6, 2022

You are brilliant! The problem is the stored checkpoint from stage 1 has a mismatched name ( Specifically, no checkpoint is loaded in. And of course, the batch_size is too large for me, I switch it from 4 to 1 ). That’s the reason why it has that Error. It even didn’t reach the data part! After I change the name to be matched, the stage2 fine-tuning is running! Thank you so much!

Top Results From Across the Web

LRP-Inference Fine-Tuning for Image Captioning Models - arXiv

Abstract—This paper analyzes the predictions of image cap- tioning models with attention mechanisms beyond visualizing the attention itself.

Explain and improve: LRP-inference fine-tuning for image ...

We propose an LRP-inference fine-tuning strategy that reduces object ... In the rest of this paper, Section 2 introduces recent image captioning models, ......

Contcap - A scalable framework for continual image captioning

Code for: Contcap - A scalable framework for continual image captioning - GitHub ... Step 2: Train model 2to21 to get model for...

Fine-Tune a Model - Amazon SageMaker

Fine-tuning trains a pretrained model on a new dataset without training from scratch. This process, also known as transfer learning, can produce accurate ......

Image captioning with visual attention | TensorFlow Core

The steps are: Load the images (and ignore images that fail to load). Replicate images to match the number of captions. Shuffle and...